Thread: oracle to psql migration - slow query in postgres
We are in the process of testing migration of our oracle data warehouse over to postgres. A potential showstopper are full table scans on our members table. We can't function on postgres effectively unless index scans are employed. I'm thinking I don't have something set correctly in my postgresql.conf file, but I'm not sure what. This table has approximately 300million rows. Version: SELECT version(); version ------------------------------------------------------------------------------------------------------------------ PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit We have 4 quad-core processors and 32GB of RAM. The below query uses the members_sorted_idx_001 index in oracle, but in postgres, the optimizer chooses a sequential scan. explain analyze create table tmp_srcmem_emws1 as select emailaddress, websiteid from members where emailok = 1 and emailbounced = 0; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Seq Scan on members (cost=0.00..14137154.64 rows=238177981 width=29) (actual time=0.052..685834.785 rows=236660930 loops=1) Filter: ((emailok = 1::numeric) AND (emailbounced = 0::numeric)) Total runtime: 850306.220 ms (3 rows) show shared_buffers ; shared_buffers ---------------- 7680MB (1 row) show effective_cache_size ; effective_cache_size ---------------------- 22GB (1 row) show work_mem ; work_mem ---------- 768MB (1 row) show enable_seqscan ; enable_seqscan ---------------- on (1 row) Below are the data definitions for the table/indexes in question: \d members Table "members" Column | Type | Modifiers ---------------------+-----------------------------+----------- memberid | numeric | not null firstname | character varying(50) | lastname | character varying(50) | emailaddress | character varying(50) | password | character varying(50) | address1 | character varying(50) | address2 | character varying(50) | city | character varying(50) | statecode | character varying(50) | zipcode | character varying(50) | birthdate | date | emailok | numeric(2,0) | gender | character varying(1) | addeddate | timestamp without time zone | emailbounced | numeric(2,0) | changedate | timestamp without time zone | optoutsource | character varying(100) | websiteid | numeric | promotionid | numeric | sourceid | numeric | siteid | character varying(64) | srcwebsiteid | numeric | homephone | character varying(20) | homeareacode | character varying(10) | campaignid | numeric | srcmemberid | numeric | optoutdate | date | regcomplete | numeric(1,0) | regcompletesourceid | numeric | ipaddress | character varying(25) | pageid | numeric | streetaddressstatus | numeric(1,0) | middlename | character varying(50) | optinprechecked | numeric(1,0) | optinposition | numeric | homephonestatus | numeric | addeddate_id | numeric | changedate_id | numeric | rpmindex | numeric | optmode | numeric(1,0) | countryid | numeric | confirmoptin | numeric(2,0) | bouncedate | date | memberageid | numeric | sourceid2 | numeric | remoteuserid | character varying(50) | goal | numeric(1,0) | flowdepth | numeric | pagetype | numeric | savepassword | character varying(50) | customerprofileid | numeric | Indexes: "email_website_unq" UNIQUE, btree (emailaddress, websiteid), tablespace "members_idx" "member_addeddateid_idx" btree (addeddate_id), tablespace "members_idx" "member_changedateid_idx" btree (changedate_id), tablespace "members_idx" "members_fdate_idx" btree (to_char_year_month(addeddate)), tablespace "esave_idx" "members_memberid_idx" btree (memberid), tablespace "members_idx" "members_mid_emailok_idx" btree (memberid, emailaddress, zipcode, firstname, emailok), tablespace "members_idx" "members_sorted_idx_001" btree (websiteid, emailok, emailbounced, addeddate, memberid, zipcode, statecode, emailaddress), tablespace "members_idx" "members_src_idx" btree (websiteid, emailbounced, sourceid), tablespace "members_idx" "members_wid_idx" btree (websiteid), tablespace "members_idx" select tablename, indexname, tablespace, indexdef from pg_indexes where tablename = 'members'; -[ RECORD 1 ]---------------------------------------------------------------------------------------------------------------------------------------------------- tablename | members indexname | members_fdate_idx tablespace | esave_idx indexdef | CREATE INDEX members_fdate_idx ON members USING btree (to_char_year_month(addeddate)) -[ RECORD 2 ]---------------------------------------------------------------------------------------------------------------------------------------------------- tablename | members indexname | member_changedateid_idx tablespace | members_idx indexdef | CREATE INDEX member_changedateid_idx ON members USING btree (changedate_id) -[ RECORD 3 ]---------------------------------------------------------------------------------------------------------------------------------------------------- tablename | members indexname | member_addeddateid_idx tablespace | members_idx indexdef | CREATE INDEX member_addeddateid_idx ON members USING btree (addeddate_id) -[ RECORD 4 ]---------------------------------------------------------------------------------------------------------------------------------------------------- tablename | members indexname | members_wid_idx tablespace | members_idx indexdef | CREATE INDEX members_wid_idx ON members USING btree (websiteid) -[ RECORD 5 ]---------------------------------------------------------------------------------------------------------------------------------------------------- tablename | members indexname | members_src_idx tablespace | members_idx indexdef | CREATE INDEX members_src_idx ON members USING btree (websiteid, emailbounced, sourceid) -[ RECORD 6 ]---------------------------------------------------------------------------------------------------------------------------------------------------- tablename | members indexname | members_sorted_idx_001 tablespace | members_idx indexdef | CREATE INDEX members_sorted_idx_001 ON members USING btree (websiteid, emailok, emailbounced, addeddate, memberid, zipcode, statecode, emailaddress) -[ RECORD 7 ]---------------------------------------------------------------------------------------------------------------------------------------------------- tablename | members indexname | members_mid_emailok_idx tablespace | members_idx indexdef | CREATE INDEX members_mid_emailok_idx ON members USING btree (memberid, emailaddress, zipcode, firstname, emailok) -[ RECORD 8 ]---------------------------------------------------------------------------------------------------------------------------------------------------- tablename | members indexname | members_memberid_idx tablespace | members_idx indexdef | CREATE INDEX members_memberid_idx ON members USING btree (memberid) -[ RECORD 9 ]---------------------------------------------------------------------------------------------------------------------------------------------------- tablename | members indexname | email_website_unq tablespace | members_idx indexdef | CREATE UNIQUE INDEX email_website_unq ON members USING btree (emailaddress, websiteid) This table has also been vacuumed analyzed as well: select * from pg_stat_all_tables where relname = 'members'; -[ RECORD 1 ]----+------------------------------ relid | 3112786 schemaname | xxxxx relname | members seq_scan | 298 seq_tup_read | 42791828896 idx_scan | 31396925 idx_tup_fetch | 1083796963 n_tup_ins | 291308316 n_tup_upd | 0 n_tup_del | 4188020 n_tup_hot_upd | 0 n_live_tup | 285364632 n_dead_tup | 109658 last_vacuum | 2010-10-12 20:26:01.227393-04 last_autovacuum | last_analyze | 2010-10-12 20:28:01.105656-04 last_autoanalyze | 2010-09-16 20:50:00.712418-04
On Thu, Oct 14, 2010 at 12:43 PM, Tony Capobianco <tcapobianco@prospectiv.com> wrote: > We have 4 quad-core processors and 32GB of RAM. The below query uses > the members_sorted_idx_001 index in oracle, but in postgres, the > optimizer chooses a sequential scan. > > explain analyze create table tmp_srcmem_emws1 > as > select emailaddress, websiteid > from members > where emailok = 1 > and emailbounced = 0; Maybe a couple indexes to try: create index members_emailok_emailbounced_idx on members (emailok,emailbounced); or a functional index (will likely be smaller, depending on the contents of your table): create index members_emailok_emailbounced_idx on members (emailok,emailbounced) where emailok = 1 and emailbounced = 0; -- if you use that combination of 1 and 0 regularly
On 10/14/10 21:43, Tony Capobianco wrote: > We have 4 quad-core processors and 32GB of RAM. The below query uses > the members_sorted_idx_001 index in oracle, but in postgres, the > optimizer chooses a sequential scan. > > explain analyze create table tmp_srcmem_emws1 > as > select emailaddress, websiteid > from members > where emailok = 1 > and emailbounced = 0; > QUERY > PLAN > ------------------------------------------------------------------------------------------------------------------------------ > Seq Scan on members (cost=0.00..14137154.64 rows=238177981 width=29) > (actual time=0.052..685834.785 rows=236660930 loops=1) > Filter: ((emailok = 1::numeric) AND (emailbounced = 0::numeric)) > Total runtime: 850306.220 ms > (3 rows) > Indexes: > "email_website_unq" UNIQUE, btree (emailaddress, websiteid), > tablespace "members_idx" > "member_addeddateid_idx" btree (addeddate_id), tablespace > "members_idx" > "member_changedateid_idx" btree (changedate_id), tablespace > "members_idx" > "members_fdate_idx" btree (to_char_year_month(addeddate)), > tablespace "esave_idx" > "members_memberid_idx" btree (memberid), tablespace "members_idx" > "members_mid_emailok_idx" btree (memberid, emailaddress, zipcode, > firstname, emailok), tablespace "members_idx" > "members_sorted_idx_001" btree (websiteid, emailok, emailbounced, > addeddate, memberid, zipcode, statecode, emailaddress), tablespace > "members_idx" > "members_src_idx" btree (websiteid, emailbounced, sourceid), > tablespace "members_idx" > "members_wid_idx" btree (websiteid), tablespace "members_idx" PostgreSQL doesn't fetch data directly from indexes, so there is no way for it to reasonably use an index declared like: "members_sorted_idx_001" btree (websiteid, emailok, emailbounced, addeddate, memberid, zipcode, statecode, emailaddress) You need a direct index on the fields you are using in your query, i.e. an index on (emailok, emailbounced). OTOH, those columns look boolean-like. It depends on what your data set is, but if the majority of records contain (emailok=1 and emailbounced=0) an index may not help you much.
2010/10/14 Tony Capobianco <tcapobianco@prospectiv.com>: > We are in the process of testing migration of our oracle data warehouse > over to postgres. A potential showstopper are full table scans on our > members table. We can't function on postgres effectively unless index > scans are employed. I'm thinking I don't have something set correctly > in my postgresql.conf file, but I'm not sure what. > > This table has approximately 300million rows. and your query grab rows=236 660 930 of them. An index might be useless in this situation. > > Version: > SELECT version(); > > version > ------------------------------------------------------------------------------------------------------------------ > PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) > 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit > > We have 4 quad-core processors and 32GB of RAM. The below query uses > the members_sorted_idx_001 index in oracle, but in postgres, the > optimizer chooses a sequential scan. > > explain analyze create table tmp_srcmem_emws1 > as > select emailaddress, websiteid > from members > where emailok = 1 > and emailbounced = 0; > QUERY > PLAN > ------------------------------------------------------------------------------------------------------------------------------ > Seq Scan on members (cost=0.00..14137154.64 rows=238177981 width=29) > (actual time=0.052..685834.785 rows=236660930 loops=1) > Filter: ((emailok = 1::numeric) AND (emailbounced = 0::numeric)) > Total runtime: 850306.220 ms > (3 rows) > > show shared_buffers ; > shared_buffers > ---------------- > 7680MB > (1 row) > > show effective_cache_size ; > effective_cache_size > ---------------------- > 22GB > (1 row) > > show work_mem ; > work_mem > ---------- > 768MB > (1 row) > > show enable_seqscan ; > enable_seqscan > ---------------- > on > (1 row) > > Below are the data definitions for the table/indexes in question: > > \d members > Table "members" > Column | Type | Modifiers > ---------------------+-----------------------------+----------- > memberid | numeric | not null > firstname | character varying(50) | > lastname | character varying(50) | > emailaddress | character varying(50) | > password | character varying(50) | > address1 | character varying(50) | > address2 | character varying(50) | > city | character varying(50) | > statecode | character varying(50) | > zipcode | character varying(50) | > birthdate | date | > emailok | numeric(2,0) | > gender | character varying(1) | > addeddate | timestamp without time zone | > emailbounced | numeric(2,0) | > changedate | timestamp without time zone | > optoutsource | character varying(100) | > websiteid | numeric | > promotionid | numeric | > sourceid | numeric | > siteid | character varying(64) | > srcwebsiteid | numeric | > homephone | character varying(20) | > homeareacode | character varying(10) | > campaignid | numeric | > srcmemberid | numeric | > optoutdate | date | > regcomplete | numeric(1,0) | > regcompletesourceid | numeric | > ipaddress | character varying(25) | > pageid | numeric | > streetaddressstatus | numeric(1,0) | > middlename | character varying(50) | > optinprechecked | numeric(1,0) | > optinposition | numeric | > homephonestatus | numeric | > addeddate_id | numeric | > changedate_id | numeric | > rpmindex | numeric | > optmode | numeric(1,0) | > countryid | numeric | > confirmoptin | numeric(2,0) | > bouncedate | date | > memberageid | numeric | > sourceid2 | numeric | > remoteuserid | character varying(50) | > goal | numeric(1,0) | > flowdepth | numeric | > pagetype | numeric | > savepassword | character varying(50) | > customerprofileid | numeric | > Indexes: > "email_website_unq" UNIQUE, btree (emailaddress, websiteid), > tablespace "members_idx" > "member_addeddateid_idx" btree (addeddate_id), tablespace > "members_idx" > "member_changedateid_idx" btree (changedate_id), tablespace > "members_idx" > "members_fdate_idx" btree (to_char_year_month(addeddate)), > tablespace "esave_idx" > "members_memberid_idx" btree (memberid), tablespace "members_idx" > "members_mid_emailok_idx" btree (memberid, emailaddress, zipcode, > firstname, emailok), tablespace "members_idx" > "members_sorted_idx_001" btree (websiteid, emailok, emailbounced, > addeddate, memberid, zipcode, statecode, emailaddress), tablespace > "members_idx" > "members_src_idx" btree (websiteid, emailbounced, sourceid), > tablespace "members_idx" > "members_wid_idx" btree (websiteid), tablespace "members_idx" > > select tablename, indexname, tablespace, indexdef from pg_indexes where > tablename = 'members'; > -[ RECORD > 1 ]---------------------------------------------------------------------------------------------------------------------------------------------------- > tablename | members > indexname | members_fdate_idx > tablespace | esave_idx > indexdef | CREATE INDEX members_fdate_idx ON members USING btree > (to_char_year_month(addeddate)) > -[ RECORD > 2 ]---------------------------------------------------------------------------------------------------------------------------------------------------- > tablename | members > indexname | member_changedateid_idx > tablespace | members_idx > indexdef | CREATE INDEX member_changedateid_idx ON members USING btree > (changedate_id) > -[ RECORD > 3 ]---------------------------------------------------------------------------------------------------------------------------------------------------- > tablename | members > indexname | member_addeddateid_idx > tablespace | members_idx > indexdef | CREATE INDEX member_addeddateid_idx ON members USING btree > (addeddate_id) > -[ RECORD > 4 ]---------------------------------------------------------------------------------------------------------------------------------------------------- > tablename | members > indexname | members_wid_idx > tablespace | members_idx > indexdef | CREATE INDEX members_wid_idx ON members USING btree > (websiteid) > -[ RECORD > 5 ]---------------------------------------------------------------------------------------------------------------------------------------------------- > tablename | members > indexname | members_src_idx > tablespace | members_idx > indexdef | CREATE INDEX members_src_idx ON members USING btree > (websiteid, emailbounced, sourceid) > -[ RECORD > 6 ]---------------------------------------------------------------------------------------------------------------------------------------------------- > tablename | members > indexname | members_sorted_idx_001 > tablespace | members_idx > indexdef | CREATE INDEX members_sorted_idx_001 ON members USING btree > (websiteid, emailok, emailbounced, addeddate, memberid, zipcode, > statecode, emailaddress) > -[ RECORD > 7 ]---------------------------------------------------------------------------------------------------------------------------------------------------- > tablename | members > indexname | members_mid_emailok_idx > tablespace | members_idx > indexdef | CREATE INDEX members_mid_emailok_idx ON members USING btree > (memberid, emailaddress, zipcode, firstname, emailok) > -[ RECORD > 8 ]---------------------------------------------------------------------------------------------------------------------------------------------------- > tablename | members > indexname | members_memberid_idx > tablespace | members_idx > indexdef | CREATE INDEX members_memberid_idx ON members USING btree > (memberid) > -[ RECORD > 9 ]---------------------------------------------------------------------------------------------------------------------------------------------------- > tablename | members > indexname | email_website_unq > tablespace | members_idx > indexdef | CREATE UNIQUE INDEX email_website_unq ON members USING > btree (emailaddress, websiteid) > > > This table has also been vacuumed analyzed as well: > > select * from pg_stat_all_tables where relname = 'members'; > -[ RECORD 1 ]----+------------------------------ > relid | 3112786 > schemaname | xxxxx > relname | members > seq_scan | 298 > seq_tup_read | 42791828896 > idx_scan | 31396925 > idx_tup_fetch | 1083796963 > n_tup_ins | 291308316 > n_tup_upd | 0 > n_tup_del | 4188020 > n_tup_hot_upd | 0 > n_live_tup | 285364632 > n_dead_tup | 109658 > last_vacuum | 2010-10-12 20:26:01.227393-04 > last_autovacuum | > last_analyze | 2010-10-12 20:28:01.105656-04 > last_autoanalyze | 2010-09-16 20:50:00.712418-04 > > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
> emailok | numeric(2,0) | Note that NUMERIC is meant for - really large numbers with lots of digits - or controlled precision and rounding (ie, order total isn't 99.999999999999 $) Accordingly, NUMERIC is a lot slower in all operations, and uses a lot more space, than all the other numeric types. I see many columns in your table that are declared as NUMERIC but should be BOOLs, or SMALLINTs, or INTs, or BIGINTs. Perhaps Oracle handles these differently, I dunno.
Just my take on this. The first thing I'd do is think real hard about whether you really really want 'numeric' instead of boolean, smallint, or integer. The second thing is that none of your indices (which specify a whole bunch of fields, by the way) have only just emailok, emailbounced, or only the pair of them. Without knowing the needs of your app, I would reconsider your index choices and go with fewer columns per index. For this particular query I would think either two indexes (depending on the cardinality of the data, one for each of emailok, emailbounced) or one index (containing both emailok, emailbounced) would make quite a bit of difference. Consider creating the indexes using a WITH clause, for example: CREATE INDEX members_just_an_example_idx ON members (emailok, emailbounced) WHERE emailok = 1 AND emailbounced = 0; Obviously that index is only useful in situations where both fields are specified with those values. Furthermore, if the result is such that a very high percentage of the table has those conditions a sequential scan is going to be cheaper, anyway. -- Jon
On 10/14/2010 4:10 PM, Jon Nelson wrote: > The first thing I'd do is think real hard about whether you really > really want 'numeric' instead of boolean, smallint, or integer. The > second thing is that none of your indices (which specify a whole bunch > of fields, by the way) have only just emailok, emailbounced, or only > the pair of them. Without knowing the needs of your app, I would > reconsider your index choices and go with fewer columns per index. > Also, make sure that the statistics is good, that histograms are large enough and that Geico (the genetic query optimizer) will really work hard to save you 15% or more on the query execution time. You can also make sure that any index existing index is used, by disabling the sequential scan and then activating and de-activating indexes with the dummy expressions, just as it was done with Oracle's rule based optimizer. I agree that a good data model is even more crucial for Postgres than is the case with Oracle. Oracle, because of its rich assortment of tweaking & hacking tools and parameters, can be made to perform, even if the model is designed by someone who didn't apply the rules of good design. Postgres is much more susceptible to bad models and it is much harder to work around a badly designed model in Postgres than in Oracle. What people do not understand is that every application in the world will become badly designed after years of maintenance, adding columns, creating additional indexes, views, tables and triggers and than deploying various tools to design applications. As noted by Murphy, things develop from bad to worse. Keep Postgres models simple and separated, because it's much easier to keep clearly defined models simple and effective than to keep models with 700 tables and 350 views, frequently with conflicting names, different columns named the same and same columns named differently. And monitor, monitor, monitor. Use strace, ltrace, pgstatspack, auto_explain, pgfouine, pgadmin, top, sar, iostat and all tools you can get hold of. Without the event interface, it's frequently a guessing game. It is, however, possible to manage things. If working with partitioning, be very aware that PostgreSQL optimizer has certain problems with partitions, especially with group functions. If you want speed, everything must be prefixed with partitioning column: indexes, expressions, joins. There is no explicit star schema and creating hash indexes will not buy you much, as a matter of fact, Postgres community is extremely suspicious of the hash indexes and I don't see them widely used. Having said that, I was able to solve the problems with my speed and partitioning. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
On Thu, Oct 14, 2010 at 8:59 PM, Mladen Gogala <mladen.gogala@vmsinfo.com> wrote:
If working with partitioning, be very aware that PostgreSQL optimizer has certain problems with partitions, especially with group functions. If you want speed, everything must be prefixed with partitioning column: indexes, expressions, joins. There is no explicit star schema and creating hash indexes will not buy you much, as a matter of fact, Postgres community is extremely suspicious of the hash indexes and I don't see them widely used.Having said that, I was able to solve the problems with my speed and partitioning.
Could you elaborate on this, please? What do you mean by 'everythin must be prefixed with partitioning column?'
--sam
Samuel Gendler wrote: > > > On Thu, Oct 14, 2010 at 8:59 PM, Mladen Gogala > <mladen.gogala@vmsinfo.com <mailto:mladen.gogala@vmsinfo.com>> wrote: > > If working with partitioning, be very aware that PostgreSQL > optimizer has certain problems with partitions, especially with > group functions. If you want speed, everything must be prefixed > with partitioning column: indexes, expressions, joins. There is no > explicit star schema and creating hash indexes will not buy you > much, as a matter of fact, Postgres community is extremely > suspicious of the hash indexes and I don't see them widely used. > Having said that, I was able to solve the problems with my speed > and partitioning. > > > Could you elaborate on this, please? What do you mean by 'everythin > must be prefixed with partitioning column?' > > --sam If you have partitioned table part_tab, partitioned on the column item_date and if there is a global primary key in Oracle, let's call it item_id, then queries like "select * from part_tab where item_id=12345" will perform worse than queries with item_date" select * from part_tab where item_id=12345 and item_date='2010-10-15' This also applies to inserts and updates. Strictly speaking, the item_date column in the query above is not necessary, after all, the item_id column is the primary key. However, with range scans you will get much better results if you include the item_date column than if you use combination of columns without. The term "prefixed indexes" is borrowed from Oracle RDBMS and means that the beginning column in the index is the column on which the table is partitioned. Oracle, as opposed to Postgres, has global indexes, the indexes that span all partitions. PostgreSQL only maintains indexes on each of the partitions separately. Oracle calls such indexes "local indexes" and defines them on the partitioned table level. Here is a brief and rather succinct explanation of the terminology: http://www.oracle-base.com/articles/8i/PartitionedTablesAndIndexes.php Of, course, there are other differences between Oracle partitioning and PostgreSQL partitioning. The main difference is $10000/CPU. I am talking from experience: news=> \d moreover_documents Table "moreover.moreover_documents" Column | Type | Modifiers ----------------------+-----------------------------+----------- document_id | bigint | not null dre_reference | bigint | not null headline | character varying(4000) | author | character varying(200) | url | character varying(1000) | rank | bigint | content | text | stories_like_this | character varying(1000) | internet_web_site_id | bigint | not null harvest_time | timestamp without time zone | valid_time | timestamp without time zone | keyword | character varying(200) | article_id | bigint | not null media_type | character varying(20) | source_type | character varying(20) | created_at | timestamp without time zone | autonomy_fed_at | timestamp without time zone | language | character varying(150) | Indexes: "moreover_documents_pkey" PRIMARY KEY, btree (document_id) Triggers: insert_moreover_trigger BEFORE INSERT ON moreover_documents FOR EACH ROW EXE CUTE PROCEDURE moreover_insert_trgfn() Number of child tables: 8 (Use \d+ to list them.) The child tables are, of course, partitions. Here is the original: Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - Production With the Partitioning, Real Application Clusters, OLAP, Data Mining and Real Application Testing options SQL> desc moreover_documents Name Null? Type ----------------------------------------- -------- ---------------------------- DOCUMENT# NOT NULL NUMBER DRE_REFERENCE NOT NULL NUMBER HEADLINE VARCHAR2(4000) AUTHOR VARCHAR2(200) URL VARCHAR2(1000) RANK NUMBER CONTENT CLOB STORIES_LIKE_THIS VARCHAR2(1000) INTERNET_WEB_SITE# NOT NULL NUMBER HARVEST_TIME DATE VALID_TIME DATE KEYWORD VARCHAR2(200) ARTICLE_ID NOT NULL NUMBER MEDIA_TYPE VARCHAR2(20) CREATED_AT DATE SOURCE_TYPE VARCHAR2(50) PUBLISH_DATE DATE AUTONOMY_FED_AT DATE LANGUAGE VARCHAR2(150) SQL> I must say that it took me some time to get things right. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions
On Thu, Oct 14, 2010 at 3:43 PM, Tony Capobianco <tcapobianco@prospectiv.com> wrote: > explain analyze create table tmp_srcmem_emws1 > as > select emailaddress, websiteid > from members > where emailok = 1 > and emailbounced = 0; *) as others have noted, none of your indexes will back this expression. For an index to match properly the index must have all the fields matched in the 'where' clause in left to right order. you could rearrange indexes you already have and probably get things to work properly. *) If you want things to go really fast, and the combination of emailok, emailbounced is a small percentage (say, less than 5) in the table, and you are not interested in the schema level changes your table is screaming, and the (1,0) combination is what you want to frequently match and you should consider: create function email_interesting(ok numeric, bounced numeric) returns bool as $$ select $1 = 1 and $2 = 0; $$ language sql immutable; create function members_email_interesting_idx on members(email_interesting(emailok, emailbounced)) where email_interesting(); This will build a partial index which you can query via: select emailaddress, websiteid from members where email_interesting(emailok, emailbounced); merlin
The recommendations on the numeric columns are fantastic. Thank you very much. We will revisit our methods of assigning datatypes when we migrate our data over from Oracle. Regarding the full table scans; it appears inevitable that full table scans are necessary for the volume of data involved and the present design of our indexes. Over time, indexes were added/removed to satisfy particular functionality. Considering this is our most important table, I will research exactly how this table is queried to better optimize/reorganize our indexes. Thanks for your help. Tony On Thu, 2010-10-14 at 23:59 -0400, Mladen Gogala wrote: > On 10/14/2010 4:10 PM, Jon Nelson wrote: > > The first thing I'd do is think real hard about whether you really > > really want 'numeric' instead of boolean, smallint, or integer. The > > second thing is that none of your indices (which specify a whole bunch > > of fields, by the way) have only just emailok, emailbounced, or only > > the pair of them. Without knowing the needs of your app, I would > > reconsider your index choices and go with fewer columns per index. > > > Also, make sure that the statistics is good, that histograms are large > enough and that Geico (the genetic query optimizer) will really work > hard to save you 15% or more on the query execution time. You can also > make sure that any index existing index is used, by disabling the > sequential scan and then activating and de-activating indexes with the > dummy expressions, just as it was done with Oracle's rule based optimizer. > I agree that a good data model is even more crucial for Postgres than is > the case with Oracle. Oracle, because of its rich assortment of tweaking > & hacking tools and parameters, can be made to perform, even if the > model is designed by someone who didn't apply the rules of good design. > Postgres is much more susceptible to bad models and it is much harder to > work around a badly designed model in Postgres than in Oracle. What > people do not understand is that every application in the world will > become badly designed after years of maintenance, adding columns, > creating additional indexes, views, tables and triggers and than > deploying various tools to design applications. As noted by Murphy, > things develop from bad to worse. Keep Postgres models simple and > separated, because it's much easier to keep clearly defined models > simple and effective than to keep models with 700 tables and 350 views, > frequently with conflicting names, different columns named the same and > same columns named differently. And monitor, monitor, monitor. Use > strace, ltrace, pgstatspack, auto_explain, pgfouine, pgadmin, top, sar, > iostat and all tools you can get hold of. Without the event interface, > it's frequently a guessing game. It is, however, possible to manage > things. If working with partitioning, be very aware that PostgreSQL > optimizer has certain problems with partitions, especially with group > functions. If you want speed, everything must be prefixed with > partitioning column: indexes, expressions, joins. There is no explicit > star schema and creating hash indexes will not buy you much, as a matter > of fact, Postgres community is extremely suspicious of the hash indexes > and I don't see them widely used. > Having said that, I was able to solve the problems with my speed and > partitioning. > > -- > Mladen Gogala > Sr. Oracle DBA > 1500 Broadway > New York, NY 10036 > (212) 329-5251 > www.vmsinfo.com > >
>> This table has approximately 300million rows. > > and your query grab rows=236 660 930 of them. An index might be > useless in this situation. I want to point out that this is probably the most important comment here. A couple of people have noted out that the index won't work for this query, but more importantly, an index is (probably) not desirable for this query. As an analogy (since everyone loves half-baked programming analogies), if you want to find a couple of bakeries to sponsor your MySQL Data Integrity Issues Awareness Walk by donating scones, you use the yellow pages. If you want to hit up every business in the area to donate whatever they can, you're better off canvasing the neighborhood. --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 215 Foster City, CA 94404 (650) 242-3500 Main www.truviso.com
> -----Original Message----- > From: Tony Capobianco [mailto:tcapobianco@prospectiv.com] > Sent: Thursday, October 14, 2010 3:43 PM > To: pgsql-performance@postgresql.org > Subject: oracle to psql migration - slow query in postgres > > We are in the process of testing migration of our oracle data > warehouse over to postgres. A potential showstopper are full > table scans on our members table. We can't function on > postgres effectively unless index scans are employed. I'm > thinking I don't have something set correctly in my > postgresql.conf file, but I'm not sure what. > > This table has approximately 300million rows. > > Version: > SELECT version(); > > version > -------------------------------------------------------------- > ---------------------------------------------------- > PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC > gcc (GCC) > 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit > > We have 4 quad-core processors and 32GB of RAM. The below > query uses the members_sorted_idx_001 index in oracle, but in > postgres, the optimizer chooses a sequential scan. > > explain analyze create table tmp_srcmem_emws1 as select > emailaddress, websiteid > from members > where emailok = 1 > and emailbounced = 0; > QUERY > PLAN > -------------------------------------------------------------- > ---------------------------------------------------------------- > Seq Scan on members (cost=0.00..14137154.64 rows=238177981 > width=29) (actual time=0.052..685834.785 rows=236660930 loops=1) > Filter: ((emailok = 1::numeric) AND (emailbounced = > 0::numeric)) Total runtime: 850306.220 ms > (3 rows) > > show shared_buffers ; > shared_buffers > ---------------- > 7680MB > (1 row) > > show effective_cache_size ; > effective_cache_size > ---------------------- > 22GB > (1 row) > > show work_mem ; > work_mem > ---------- > 768MB > (1 row) > > show enable_seqscan ; > enable_seqscan > ---------------- > on > (1 row) > > Below are the data definitions for the table/indexes in question: > > \d members > Table "members" > Column | Type | Modifiers > ---------------------+-----------------------------+----------- > memberid | numeric | not null > firstname | character varying(50) | > lastname | character varying(50) | > emailaddress | character varying(50) | > password | character varying(50) | > address1 | character varying(50) | > address2 | character varying(50) | > city | character varying(50) | > statecode | character varying(50) | > zipcode | character varying(50) | > birthdate | date | > emailok | numeric(2,0) | > gender | character varying(1) | > addeddate | timestamp without time zone | > emailbounced | numeric(2,0) | > changedate | timestamp without time zone | > optoutsource | character varying(100) | > websiteid | numeric | > promotionid | numeric | > sourceid | numeric | > siteid | character varying(64) | > srcwebsiteid | numeric | > homephone | character varying(20) | > homeareacode | character varying(10) | > campaignid | numeric | > srcmemberid | numeric | > optoutdate | date | > regcomplete | numeric(1,0) | > regcompletesourceid | numeric | > ipaddress | character varying(25) | > pageid | numeric | > streetaddressstatus | numeric(1,0) | > middlename | character varying(50) | > optinprechecked | numeric(1,0) | > optinposition | numeric | > homephonestatus | numeric | > addeddate_id | numeric | > changedate_id | numeric | > rpmindex | numeric | > optmode | numeric(1,0) | > countryid | numeric | > confirmoptin | numeric(2,0) | > bouncedate | date | > memberageid | numeric | > sourceid2 | numeric | > remoteuserid | character varying(50) | > goal | numeric(1,0) | > flowdepth | numeric | > pagetype | numeric | > savepassword | character varying(50) | > customerprofileid | numeric | > Indexes: > "email_website_unq" UNIQUE, btree (emailaddress, > websiteid), tablespace "members_idx" > "member_addeddateid_idx" btree (addeddate_id), tablespace > "members_idx" > "member_changedateid_idx" btree (changedate_id), > tablespace "members_idx" > "members_fdate_idx" btree > (to_char_year_month(addeddate)), tablespace "esave_idx" > "members_memberid_idx" btree (memberid), tablespace "members_idx" > "members_mid_emailok_idx" btree (memberid, emailaddress, > zipcode, firstname, emailok), tablespace "members_idx" > "members_sorted_idx_001" btree (websiteid, emailok, > emailbounced, addeddate, memberid, zipcode, statecode, > emailaddress), tablespace "members_idx" > "members_src_idx" btree (websiteid, emailbounced, > sourceid), tablespace "members_idx" > "members_wid_idx" btree (websiteid), tablespace "members_idx" > > select tablename, indexname, tablespace, indexdef from > pg_indexes where tablename = 'members'; -[ RECORD > 1 > ]------------------------------------------------------------- > -------------------------------------------------------------- > ------------------------- > tablename | members > indexname | members_fdate_idx > tablespace | esave_idx > indexdef | CREATE INDEX members_fdate_idx ON members USING btree > (to_char_year_month(addeddate)) > -[ RECORD > 2 > ]------------------------------------------------------------- > -------------------------------------------------------------- > ------------------------- > tablename | members > indexname | member_changedateid_idx > tablespace | members_idx > indexdef | CREATE INDEX member_changedateid_idx ON members > USING btree > (changedate_id) > -[ RECORD > 3 > ]------------------------------------------------------------- > -------------------------------------------------------------- > ------------------------- > tablename | members > indexname | member_addeddateid_idx > tablespace | members_idx > indexdef | CREATE INDEX member_addeddateid_idx ON members > USING btree > (addeddate_id) > -[ RECORD > 4 > ]------------------------------------------------------------- > -------------------------------------------------------------- > ------------------------- > tablename | members > indexname | members_wid_idx > tablespace | members_idx > indexdef | CREATE INDEX members_wid_idx ON members USING btree > (websiteid) > -[ RECORD > 5 > ]------------------------------------------------------------- > -------------------------------------------------------------- > ------------------------- > tablename | members > indexname | members_src_idx > tablespace | members_idx > indexdef | CREATE INDEX members_src_idx ON members USING btree > (websiteid, emailbounced, sourceid) > -[ RECORD > 6 > ]------------------------------------------------------------- > -------------------------------------------------------------- > ------------------------- > tablename | members > indexname | members_sorted_idx_001 > tablespace | members_idx > indexdef | CREATE INDEX members_sorted_idx_001 ON members > USING btree > (websiteid, emailok, emailbounced, addeddate, memberid, > zipcode, statecode, emailaddress) -[ RECORD > 7 > ]------------------------------------------------------------- > -------------------------------------------------------------- > ------------------------- > tablename | members > indexname | members_mid_emailok_idx > tablespace | members_idx > indexdef | CREATE INDEX members_mid_emailok_idx ON members > USING btree > (memberid, emailaddress, zipcode, firstname, emailok) -[ RECORD > 8 > ]------------------------------------------------------------- > -------------------------------------------------------------- > ------------------------- > tablename | members > indexname | members_memberid_idx > tablespace | members_idx > indexdef | CREATE INDEX members_memberid_idx ON members USING btree > (memberid) > -[ RECORD > 9 > ]------------------------------------------------------------- > -------------------------------------------------------------- > ------------------------- > tablename | members > indexname | email_website_unq > tablespace | members_idx > indexdef | CREATE UNIQUE INDEX email_website_unq ON members USING > btree (emailaddress, websiteid) > > > This table has also been vacuumed analyzed as well: > > select * from pg_stat_all_tables where relname = 'members'; > -[ RECORD 1 ]----+------------------------------ > relid | 3112786 > schemaname | xxxxx > relname | members > seq_scan | 298 > seq_tup_read | 42791828896 > idx_scan | 31396925 > idx_tup_fetch | 1083796963 > n_tup_ins | 291308316 > n_tup_upd | 0 > n_tup_del | 4188020 > n_tup_hot_upd | 0 > n_live_tup | 285364632 > n_dead_tup | 109658 > last_vacuum | 2010-10-12 20:26:01.227393-04 > last_autovacuum | > last_analyze | 2010-10-12 20:28:01.105656-04 > last_autoanalyze | 2010-09-16 20:50:00.712418-04 > > Tony, For your query: > select > emailaddress, websiteid > from members > where emailok = 1 > and emailbounced = 0; your table doesn't have any indexes where "emailok" or "emailbounced" are leading columns. That's why existing indexes can not be used. If you specified "websiteid" in the "where" clause then (most probably) the index members_sorted_idx_001 will be used (based on selectivity and statistics known to optimizer). If this query (as is - without "websiteid") is important for your app, create another index on (emailok, emailbounced) which should help, of course if selectivity of your where clause is good enough (not to perform full table scan). Regards, Igor Neyman
Thanks for all your responses. What's interesting is that an index is used when this query is executed in Oracle. It appears to do some parallel processing: SQL> set line 200 delete from plan_table; explain plan for select websiteid, emailaddress from members where emailok = 1 and emailbounced = 0; SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY()); SQL> 3 rows deleted. SQL> 2 3 4 5 Explained. SQL> SQL> PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Plan hash value: 4247959398 ------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 237M| 7248M| 469K (2)| 01:49:33 | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10000 | 237M| 7248M| 469K (2)| 01:49:33 | Q1,00 | P->S | QC (RAND) | | 3 | PX BLOCK ITERATOR | | 237M| 7248M| 469K (2)| 01:49:33 | Q1,00 | PCWC | | |* 4 | INDEX FAST FULL SCAN| MEMBERS_SORTED_IDX_001 | 237M| 7248M| 469K (2)| 01:49:33 | Q1,00 | PCWP | | ------------------------------------------------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - filter("EMAILBOUNCED"=0 AND "EMAILOK"=1) 16 rows selected. On Fri, 2010-10-15 at 13:43 -0400, Igor Neyman wrote: > > > -----Original Message----- > > From: Tony Capobianco [mailto:tcapobianco@prospectiv.com] > > Sent: Thursday, October 14, 2010 3:43 PM > > To: pgsql-performance@postgresql.org > > Subject: oracle to psql migration - slow query in postgres > > > > We are in the process of testing migration of our oracle data > > warehouse over to postgres. A potential showstopper are full > > table scans on our members table. We can't function on > > postgres effectively unless index scans are employed. I'm > > thinking I don't have something set correctly in my > > postgresql.conf file, but I'm not sure what. > > > > This table has approximately 300million rows. > > > > Version: > > SELECT version(); > > > > version > > -------------------------------------------------------------- > > ---------------------------------------------------- > > PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC > > gcc (GCC) > > 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit > > > > We have 4 quad-core processors and 32GB of RAM. The below > > query uses the members_sorted_idx_001 index in oracle, but in > > postgres, the optimizer chooses a sequential scan. > > > > explain analyze create table tmp_srcmem_emws1 as select > > emailaddress, websiteid > > from members > > where emailok = 1 > > and emailbounced = 0; > > QUERY > > PLAN > > -------------------------------------------------------------- > > ---------------------------------------------------------------- > > Seq Scan on members (cost=0.00..14137154.64 rows=238177981 > > width=29) (actual time=0.052..685834.785 rows=236660930 loops=1) > > Filter: ((emailok = 1::numeric) AND (emailbounced = > > 0::numeric)) Total runtime: 850306.220 ms > > (3 rows) > > > > show shared_buffers ; > > shared_buffers > > ---------------- > > 7680MB > > (1 row) > > > > show effective_cache_size ; > > effective_cache_size > > ---------------------- > > 22GB > > (1 row) > > > > show work_mem ; > > work_mem > > ---------- > > 768MB > > (1 row) > > > > show enable_seqscan ; > > enable_seqscan > > ---------------- > > on > > (1 row) > > > > Below are the data definitions for the table/indexes in question: > > > > \d members > > Table "members" > > Column | Type | Modifiers > > ---------------------+-----------------------------+----------- > > memberid | numeric | not null > > firstname | character varying(50) | > > lastname | character varying(50) | > > emailaddress | character varying(50) | > > password | character varying(50) | > > address1 | character varying(50) | > > address2 | character varying(50) | > > city | character varying(50) | > > statecode | character varying(50) | > > zipcode | character varying(50) | > > birthdate | date | > > emailok | numeric(2,0) | > > gender | character varying(1) | > > addeddate | timestamp without time zone | > > emailbounced | numeric(2,0) | > > changedate | timestamp without time zone | > > optoutsource | character varying(100) | > > websiteid | numeric | > > promotionid | numeric | > > sourceid | numeric | > > siteid | character varying(64) | > > srcwebsiteid | numeric | > > homephone | character varying(20) | > > homeareacode | character varying(10) | > > campaignid | numeric | > > srcmemberid | numeric | > > optoutdate | date | > > regcomplete | numeric(1,0) | > > regcompletesourceid | numeric | > > ipaddress | character varying(25) | > > pageid | numeric | > > streetaddressstatus | numeric(1,0) | > > middlename | character varying(50) | > > optinprechecked | numeric(1,0) | > > optinposition | numeric | > > homephonestatus | numeric | > > addeddate_id | numeric | > > changedate_id | numeric | > > rpmindex | numeric | > > optmode | numeric(1,0) | > > countryid | numeric | > > confirmoptin | numeric(2,0) | > > bouncedate | date | > > memberageid | numeric | > > sourceid2 | numeric | > > remoteuserid | character varying(50) | > > goal | numeric(1,0) | > > flowdepth | numeric | > > pagetype | numeric | > > savepassword | character varying(50) | > > customerprofileid | numeric | > > Indexes: > > "email_website_unq" UNIQUE, btree (emailaddress, > > websiteid), tablespace "members_idx" > > "member_addeddateid_idx" btree (addeddate_id), tablespace > > "members_idx" > > "member_changedateid_idx" btree (changedate_id), > > tablespace "members_idx" > > "members_fdate_idx" btree > > (to_char_year_month(addeddate)), tablespace "esave_idx" > > "members_memberid_idx" btree (memberid), tablespace "members_idx" > > "members_mid_emailok_idx" btree (memberid, emailaddress, > > zipcode, firstname, emailok), tablespace "members_idx" > > "members_sorted_idx_001" btree (websiteid, emailok, > > emailbounced, addeddate, memberid, zipcode, statecode, > > emailaddress), tablespace "members_idx" > > "members_src_idx" btree (websiteid, emailbounced, > > sourceid), tablespace "members_idx" > > "members_wid_idx" btree (websiteid), tablespace "members_idx" > > > > select tablename, indexname, tablespace, indexdef from > > pg_indexes where tablename = 'members'; -[ RECORD > > 1 > > ]------------------------------------------------------------- > > -------------------------------------------------------------- > > ------------------------- > > tablename | members > > indexname | members_fdate_idx > > tablespace | esave_idx > > indexdef | CREATE INDEX members_fdate_idx ON members USING btree > > (to_char_year_month(addeddate)) > > -[ RECORD > > 2 > > ]------------------------------------------------------------- > > -------------------------------------------------------------- > > ------------------------- > > tablename | members > > indexname | member_changedateid_idx > > tablespace | members_idx > > indexdef | CREATE INDEX member_changedateid_idx ON members > > USING btree > > (changedate_id) > > -[ RECORD > > 3 > > ]------------------------------------------------------------- > > -------------------------------------------------------------- > > ------------------------- > > tablename | members > > indexname | member_addeddateid_idx > > tablespace | members_idx > > indexdef | CREATE INDEX member_addeddateid_idx ON members > > USING btree > > (addeddate_id) > > -[ RECORD > > 4 > > ]------------------------------------------------------------- > > -------------------------------------------------------------- > > ------------------------- > > tablename | members > > indexname | members_wid_idx > > tablespace | members_idx > > indexdef | CREATE INDEX members_wid_idx ON members USING btree > > (websiteid) > > -[ RECORD > > 5 > > ]------------------------------------------------------------- > > -------------------------------------------------------------- > > ------------------------- > > tablename | members > > indexname | members_src_idx > > tablespace | members_idx > > indexdef | CREATE INDEX members_src_idx ON members USING btree > > (websiteid, emailbounced, sourceid) > > -[ RECORD > > 6 > > ]------------------------------------------------------------- > > -------------------------------------------------------------- > > ------------------------- > > tablename | members > > indexname | members_sorted_idx_001 > > tablespace | members_idx > > indexdef | CREATE INDEX members_sorted_idx_001 ON members > > USING btree > > (websiteid, emailok, emailbounced, addeddate, memberid, > > zipcode, statecode, emailaddress) -[ RECORD > > 7 > > ]------------------------------------------------------------- > > -------------------------------------------------------------- > > ------------------------- > > tablename | members > > indexname | members_mid_emailok_idx > > tablespace | members_idx > > indexdef | CREATE INDEX members_mid_emailok_idx ON members > > USING btree > > (memberid, emailaddress, zipcode, firstname, emailok) -[ RECORD > > 8 > > ]------------------------------------------------------------- > > -------------------------------------------------------------- > > ------------------------- > > tablename | members > > indexname | members_memberid_idx > > tablespace | members_idx > > indexdef | CREATE INDEX members_memberid_idx ON members USING btree > > (memberid) > > -[ RECORD > > 9 > > ]------------------------------------------------------------- > > -------------------------------------------------------------- > > ------------------------- > > tablename | members > > indexname | email_website_unq > > tablespace | members_idx > > indexdef | CREATE UNIQUE INDEX email_website_unq ON members USING > > btree (emailaddress, websiteid) > > > > > > This table has also been vacuumed analyzed as well: > > > > select * from pg_stat_all_tables where relname = 'members'; > > -[ RECORD 1 ]----+------------------------------ > > relid | 3112786 > > schemaname | xxxxx > > relname | members > > seq_scan | 298 > > seq_tup_read | 42791828896 > > idx_scan | 31396925 > > idx_tup_fetch | 1083796963 > > n_tup_ins | 291308316 > > n_tup_upd | 0 > > n_tup_del | 4188020 > > n_tup_hot_upd | 0 > > n_live_tup | 285364632 > > n_dead_tup | 109658 > > last_vacuum | 2010-10-12 20:26:01.227393-04 > > last_autovacuum | > > last_analyze | 2010-10-12 20:28:01.105656-04 > > last_autoanalyze | 2010-09-16 20:50:00.712418-04 > > > > > > > Tony, > For your query: > > > select > > emailaddress, websiteid > > from members > > where emailok = 1 > > and emailbounced = 0; > > your table doesn't have any indexes where "emailok" or "emailbounced" > are leading columns. > That's why existing indexes can not be used. > > If you specified "websiteid" in the "where" clause then (most probably) > the index members_sorted_idx_001 will be used (based on selectivity and > statistics known to optimizer). > > If this query (as is - without "websiteid") is important for your app, > create another index on (emailok, emailbounced) which should help, of > course if selectivity of your where clause is good enough (not to > perform full table scan). > > Regards, > Igor Neyman >
> -----Original Message----- > From: Tony Capobianco [mailto:tcapobianco@prospectiv.com] > Sent: Friday, October 15, 2010 2:14 PM > To: pgsql-performance@postgresql.org > Subject: Re: oracle to psql migration - slow query in postgres > > Thanks for all your responses. What's interesting is that an > index is used when this query is executed in Oracle. It > appears to do some parallel processing: > > SQL> set line 200 > delete from plan_table; > explain plan for > select websiteid, emailaddress > from members > where emailok = 1 > and emailbounced = 0; > > SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY()); > SQL> > 3 rows deleted. > > SQL> 2 3 4 5 > Explained. > > SQL> SQL> > PLAN_TABLE_OUTPUT > -------------------------------------------------------------- > -------------------------------------------------------------- > -------------------------------------------------------------- > -------------- > Plan hash value: 4247959398 > > -------------------------------------------------------------- > ----------------------------------------------------------------- > | Id | Operation | Name | > Rows | Bytes > | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | > -------------------------------------------------------------- > ----------------------------------------------------------------- > | 0 | SELECT STATEMENT | | 237M| > 7248M| 469K (2)| 01:49:33 | | | | > | 1 | PX COORDINATOR | | | > | | | | | | > | 2 | PX SEND QC (RANDOM) | :TQ10000 | 237M| > 7248M| 469K (2)| 01:49:33 | Q1,00 | P->S | QC (RAND) | > | 3 | PX BLOCK ITERATOR | | 237M| > 7248M| 469K (2)| 01:49:33 | Q1,00 | PCWC | | > |* 4 | INDEX FAST FULL SCAN| MEMBERS_SORTED_IDX_001 | 237M| > 7248M| 469K (2)| 01:49:33 | Q1,00 | PCWP | | > -------------------------------------------------------------- > ----------------------------------------------------------------- > > PLAN_TABLE_OUTPUT > -------------------------------------------------------------- > -------------------------------------------------------------- > -------------------------------------------------------------- > -------------- > > Predicate Information (identified by operation id): > --------------------------------------------------- > > 4 - filter("EMAILBOUNCED"=0 AND "EMAILOK"=1) > > 16 rows selected. > > 1. Postgres doesn't have "FAST FULL SCAN" because even if all the info is in the index, it need to visit the row in the table ("visibility" issue). 2. Postgres doesn't have parallel executions. BUT, it's free anf has greate community support, as you already saw. Regards, Igor Neyman
Very true Igor! Free is my favorite price. I'll figure a way around this issue. Thanks for your help. Tony On Fri, 2010-10-15 at 14:54 -0400, Igor Neyman wrote: > > -----Original Message----- > > From: Tony Capobianco [mailto:tcapobianco@prospectiv.com] > > Sent: Friday, October 15, 2010 2:14 PM > > To: pgsql-performance@postgresql.org > > Subject: Re: oracle to psql migration - slow query in postgres > > > > Thanks for all your responses. What's interesting is that an > > index is used when this query is executed in Oracle. It > > appears to do some parallel processing: > > > > SQL> set line 200 > > delete from plan_table; > > explain plan for > > select websiteid, emailaddress > > from members > > where emailok = 1 > > and emailbounced = 0; > > > > SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY()); > > SQL> > > 3 rows deleted. > > > > SQL> 2 3 4 5 > > Explained. > > > > SQL> SQL> > > PLAN_TABLE_OUTPUT > > -------------------------------------------------------------- > > -------------------------------------------------------------- > > -------------------------------------------------------------- > > -------------- > > Plan hash value: 4247959398 > > > > -------------------------------------------------------------- > > ----------------------------------------------------------------- > > | Id | Operation | Name | > > Rows | Bytes > > | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | > > -------------------------------------------------------------- > > ----------------------------------------------------------------- > > | 0 | SELECT STATEMENT | | 237M| > > 7248M| 469K (2)| 01:49:33 | | | | > > | 1 | PX COORDINATOR | | | > > | | | | | | > > | 2 | PX SEND QC (RANDOM) | :TQ10000 | 237M| > > 7248M| 469K (2)| 01:49:33 | Q1,00 | P->S | QC (RAND) | > > | 3 | PX BLOCK ITERATOR | | 237M| > > 7248M| 469K (2)| 01:49:33 | Q1,00 | PCWC | | > > |* 4 | INDEX FAST FULL SCAN| MEMBERS_SORTED_IDX_001 | 237M| > > 7248M| 469K (2)| 01:49:33 | Q1,00 | PCWP | | > > -------------------------------------------------------------- > > ----------------------------------------------------------------- > > > > PLAN_TABLE_OUTPUT > > -------------------------------------------------------------- > > -------------------------------------------------------------- > > -------------------------------------------------------------- > > -------------- > > > > Predicate Information (identified by operation id): > > --------------------------------------------------- > > > > 4 - filter("EMAILBOUNCED"=0 AND "EMAILOK"=1) > > > > 16 rows selected. > > > > > > 1. Postgres doesn't have "FAST FULL SCAN" because even if all the info > is in the index, it need to visit the row in the table ("visibility" > issue). > > 2. Postgres doesn't have parallel executions. > > BUT, it's free anf has greate community support, as you already saw. > > Regards, > Igor Neyman >