Thread: index problem
Hi, I have that: 1) db=> explain select pxygy_pid from prog_dgy_xy where pxygy_pid=12121; NOTICE: QUERY PLAN: Group (cost=0.00..29970.34 rows=921 width=4) -> Index Scan using progdgyxy_idx2 on prog_dgy_xy (cost=0.00..29947.32 rows=9210 width=4) than: 2) db=> explain select pxygy_pid from prog_dgy_xy where pxygy_pid>12121; NOTICE: QUERY PLAN: Group (cost=66927.88..67695.39 rows=30700 width=4) -> Sort (cost=66927.88..66927.88 rows=307004 width=4) -> SeqScan on prog_dgy_xy (cost=0.00..32447.66 rows=307004 width=4) I making some banchmarks on: oracle vs postgres vs mysql. And this is breaking me now;) Mysql and oracle width same table and index use that index on pxygy_pid; I had vacuum before. Time with mysql: bash-2.04$ time echo " select count(*) from PROG_DGY_XY where pxygy_pid>12121;" | mysql -uuser -ppasswd db count(*) 484984 real 0m13.761s user 0m0.008s sys 0m0.019s Time with postgres: bash-2.04$ time echo "select count(*) from PROG_DGY_XY where pxygy_pid>12121 " | psql -Uuser db count -------- 484984 (1 row) real 0m22.480s user 0m0.011s sys 0m0.021s And this is just a little part of another selects joining tables, but because this index is not used, selecting from 2 tables (which has indexes, and keys on joining collumns) takes extrem time for postgres: 2m14.978s while for mysql it takes: 0m0.578s !!! this select is: select distinct PROG_ID,PROG_FTYPE,PROG_FCASTHOUR,PROG_DATE from PROG_DATA, PROG_DGY_XY where prog_id=pxygy_pid order by prog_date,prog_ftype,prog_fcasthour indexes: PROG_DATA: create index prod_data_idx1 on prog_data (prog_date,prog_ftype,prog_fcasthour); prog_id is primary key PROG_DGY_XY: create unique index progdgyxy_idx1 on PROG_DGY_XY (PXYGY_PID,PXYGY_X,PXYGY_Y); create index progdgyxy_idx2 on PROG_DGY_XY (PXYGY_PID); Thx CoL
On Tue, 16 Oct 2001, CoL wrote: > --------------------------- > The 2 table query, where prog_data has ~8800 rowsn and index on prog_id: > bash-2.04$ time echo "explain select distinct > prog_id,prog_ftype,prog_fcasthour,prog_date from prog_dgy_xy,prog_data > where pxygy_pid=prog_id " | psql -Uuser db > NOTICE: QUERY PLAN: > > Unique (cost=7432549.69..7680455.07 rows=2479054 width=32) > -> Sort (cost=7432549.69..7432549.69 rows=24790538 width=32) > -> Merge Join (cost=148864.65..161189.33 rows=24790538 width=32) > -> Index Scan using prog_data_pkey on prog_data > (cost=0.00..701.12 rows=8872 width=28) > -> Sort (cost=148864.65..148864.65 rows=921013 width=4) > -> Seq Scan on prog_dgy_xy (cost=0.00..30145.13 > rows=921013 width=4) I'm guessing that the approximately 25 million row estimate on the join has to be wrong as well given that prog_data.prog_id should be unique. Hmm, does the explain change if you vacuum analyze the other table (prog_data)? If not, what does explain show if you do a set enable_seqscan='off'; before it?
On Mon, 15 Oct 2001, Szabo Zoltan wrote: > Hi, > > I have that: > > 1) > db=> explain select pxygy_pid from prog_dgy_xy where pxygy_pid=12121; > NOTICE: QUERY PLAN: > > Group (cost=0.00..29970.34 rows=921 width=4) > -> Index Scan using progdgyxy_idx2 on prog_dgy_xy > (cost=0.00..29947.32 rows=9210 width=4) > > than: > 2) > db=> explain select pxygy_pid from prog_dgy_xy where pxygy_pid>12121; > NOTICE: QUERY PLAN: > > Group (cost=66927.88..67695.39 rows=30700 width=4) > -> Sort (cost=66927.88..66927.88 rows=307004 width=4) > -> Seq Scan on prog_dgy_xy (cost=0.00..32447.66 rows=307004 > width=4) > > I making some banchmarks on: oracle vs postgres vs mysql. And this is > breaking me now;) Mysql and oracle width same table and index use that > index on pxygy_pid; > I had vacuum before. I assume you mean you did a vacuum analyze (a plain vacuum isn't sufficient). If you did just do a regular vacuum, do a vacuum analyze to get the updated statistics. How many rows actually match pxygy_pid>12121? Is 307000 rows a reasonable estimate? How many rows are in the table?
I forget: select version(); version --------------------------------------------------------------------- PostgreSQL 7.1.3 on i386-unknown-freebsd4.3, compiledby GCC 2.95.3 It seems that there are index using problems in 7.1.3 ? (checkin same problem in comp.databases.postgresql.bugs msg from Orion) thx CoL Szabo Zoltan wrote: > Hi, > > I have that: > > 1) > db=> explain select pxygy_pid from prog_dgy_xy where pxygy_pid=12121; > NOTICE: QUERY PLAN: > > Group (cost=0.00..29970.34 rows=921 width=4) > -> Index Scan using progdgyxy_idx2 on prog_dgy_xy > (cost=0.00..29947.32 rows=9210 width=4) > > than: > 2) > db=> explain select pxygy_pid from prog_dgy_xy where pxygy_pid>12121; > NOTICE: QUERY PLAN: > > Group (cost=66927.88..67695.39 rows=30700 width=4) > -> Sort (cost=66927.88..66927.88 rows=307004 width=4) > -> Seq Scan on prog_dgy_xy (cost=0.00..32447.66 rows=307004 > width=4) > > I making some banchmarks on: oracle vs postgres vs mysql. And this is > breaking me now;) Mysql and oracle width same table and index use that > index on pxygy_pid; > I had vacuum before. > > Time with mysql: > > bash-2.04$ time echo " select count(*) from PROG_DGY_XY where > pxygy_pid>12121;" | mysql -uuser -ppasswd db > count(*) > 484984 > > real 0m13.761s > user 0m0.008s > sys 0m0.019s > > Time with postgres: > bash-2.04$ time echo "select count(*) from PROG_DGY_XY where > pxygy_pid>12121 " | psql -Uuser db > count > -------- > 484984 > (1 row) > > > real 0m22.480s > user 0m0.011s > sys 0m0.021s > > And this is just a little part of another selects joining tables, but > because this index is not used, selecting from 2 tables (which has > indexes, and keys on joining collumns) takes extrem time for postgres: > 2m14.978s while for mysql it takes: 0m0.578s !!! > > this select is: select distinct > PROG_ID,PROG_FTYPE,PROG_FCASTHOUR,PROG_DATE from PROG_DATA, PROG_DGY_XY > where prog_id=pxygy_pid order by prog_date,prog_ftype,prog_fcasthour > > indexes: > PROG_DATA: > create index prod_data_idx1 on prog_data > (prog_date,prog_ftype,prog_fcasthour); > prog_id is primary key > > PROG_DGY_XY: > create unique index progdgyxy_idx1 on PROG_DGY_XY > (PXYGY_PID,PXYGY_X,PXYGY_Y); > create index progdgyxy_idx2 on PROG_DGY_XY (PXYGY_PID); > > > Thx > CoL > -- [ Szabo Zoltan ] [ software fejleszto ] [ econet.hu Informatikai Rt. ] [ 1117 Budapest, Hauszmann A. u. 3. ] [ tel.: 371 2100 fax: 371 2101 ]
Hi, I did not make vacuum analyze ;), the vacuum once now: vacuumdb -Uuser -ddb -v -tprog_dgy_xy NOTICE: --Relation prog_dgy_xy-- NOTICE: Pages 20935: Changed 0, reaped 0, Empty 0, New 0; Tup 921013: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 178, MaxLen 184; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 2.71s/0.32u sec. NOTICE: Index progdgyxy_idx1: Pages 6679; Tuples 921013. CPU 1.41s/1.40u sec. NOTICE: Index progdgyxy_idx2: Pages 2019; Tuples 921013. CPU 0.28s/1.28u sec. I make it with -z too. So this table has more 921013 rows. The query show the same as bellow. The version is 7.1.3. ------------------------- One more interesting: the insering of these rows. Postgres: bash-2.04$ time psql -q -Uuser -f prog_dgy_xy.dump db real 131m50.006s user 3m21.838s sys 1m20.963s Mysql: bash-2.04$ time cat prog_dgy_xy.dump | mysql -uuser -ppass db real 24m50.137s user 2m6.629s sys 1m37.757s the dump file was: insert into table (...) values (...); I tried with copy, and to add begin; inserts; commit; , but the result with same time :( [For Oracle 8.1.6 sqlloader it takes 450 sec ;) ] --------------------------- The 2 table query, where prog_data has ~8800 rowsn and index on prog_id: bash-2.04$ time echo "explain select distinct prog_id,prog_ftype,prog_fcasthour,prog_date from prog_dgy_xy,prog_data where pxygy_pid=prog_id " | psql -Uuser db NOTICE: QUERY PLAN: Unique (cost=7432549.69..7680455.07 rows=2479054 width=32) -> Sort (cost=7432549.69..7432549.69 rows=24790538 width=32) -> Merge Join (cost=148864.65..161189.33 rows=24790538 width=32) -> Index Scan using prog_data_pkeyon prog_data (cost=0.00..701.12 rows=8872 width=28) -> Sort (cost=148864.65..148864.65 rows=921013 width=4) -> Seq Scan on prog_dgy_xy (cost=0.00..30145.13 rows=921013 width=4) Time: !!! real 2m3.620s the same query with mysql (i did explain in mysql, and says it use the indexes): real 0m1.998s !!! I just askin why? and why just using the index on releation "=". (same table, same index, vacuumed) (made the test more than twice) It seams to be a 7.1.3 bug? i do not test yet with 7.1.2 but tomorrow i will. CoL Stephan Szabo wrote: > On Mon, 15 Oct 2001, Szabo Zoltan wrote: > > >>Hi, >> >>I have that: >> >>1) >>db=> explain select pxygy_pid from prog_dgy_xy where pxygy_pid=12121; >>NOTICE: QUERY PLAN: >> >>Group (cost=0.00..29970.34 rows=921 width=4) >> -> Index Scan using progdgyxy_idx2 on prog_dgy_xy >>(cost=0.00..29947.32 rows=9210 width=4) >> >>than: >>2) >>db=> explain select pxygy_pid from prog_dgy_xy where pxygy_pid>12121; >>NOTICE: QUERY PLAN: >> >>Group (cost=66927.88..67695.39 rows=30700 width=4) >> -> Sort (cost=66927.88..66927.88 rows=307004 width=4) >> -> Seq Scan on prog_dgy_xy (cost=0.00..32447.66 rows=307004 >>width=4) >> >>I making some banchmarks on: oracle vs postgres vs mysql. And this is >>breaking me now;) Mysql and oracle width same table and index use that >>index on pxygy_pid; >>I had vacuum before. >> > > I assume you mean you did a vacuum analyze (a plain vacuum isn't > sufficient). If you did just do a regular vacuum, do a vacuum analyze > to get the updated statistics. > > How many rows actually match pxygy_pid>12121? Is 307000 rows a reasonable > estimate? How many rows are in the table? > > >
> > Hmm, does the explain change if you vacuum analyze the other table > > (prog_data)? If not, what does explain show if you do a > > set enable_seqscan='off'; > > before it? Did you do the vacuum analyze on the other table (prog_data) as well? It seems to be overestimating the number of joined rows, and I wonder if it would choose a different plan if it had the correct number. > The result: > db=>set enable_seqscan='off'; > db=>explain select distinct prog_id,prog_ftype,prog_fcasthour,prog_date > from prog_dgy_xy,prog_data where pxygy_pid=prog_id; > NOTICE: QUERY PLAN: > > Unique (cost=7606982.10..7854887.48 rows=2479054 width=32) > -> Sort (cost=7606982.10..7606982.10 rows=24790538 width=32) > -> Merge Join (cost=0.00..335621.73 rows=24790538 width=32) > -> Index Scan using progdgyxy_idx2 on prog_dgy_xy > (cost=0.00..323297.05 rows=921013 width=4) > -> Index Scan using prog_data_pkey on prog_data > (cost=0.00..701.12 rows=8872 width=28) > > It "seems" index is used, but the same result :(((, and bigger execution > time: real 3m41.830s Well, that means the plan it chose before was better, so enable_seqscan isn't a win here. > And why: > POSTGRES: > set enable_seqscan ='off'; select count(*) from prog_dgy_xy where > pxygy_pid<13161; > count > -------- > 900029 > real 2m34.340s > explain: > Aggregate (cost=327896.89..327896.89 rows=1 width=0) > -> Index Scan using progdgyxy_idx2 on prog_dgy_xy > (cost=0.00..325594.54 rows=920940 width=0) It's estimating the entire table will be seen (or most of it anyway), so it would choose Seq Scan as faster, but you've basically disallowed that with the enable_seqscan='off'. Is it faster without the explicit hint (it probably will be). Index Scans are not always better than Sequence Scans (especially when traversing most of the table as in the above) and you don't want to use the enable_* unless it actually is giving you a performance increase.
Hi, Stephan Szabo wrote: > On Tue, 16 Oct 2001, CoL wrote: > > >>--------------------------- >>The 2 table query, where prog_data has ~8800 rowsn and index on prog_id: >>bash-2.04$ time echo "explain select distinct >>prog_id,prog_ftype,prog_fcasthour,prog_date from prog_dgy_xy,prog_data >>where pxygy_pid=prog_id " | psql -Uuser db >>NOTICE: QUERY PLAN: >> >>Unique (cost=7432549.69..7680455.07 rows=2479054 width=32) >> -> Sort (cost=7432549.69..7432549.69 rows=24790538 width=32) >> -> Merge Join (cost=148864.65..161189.33 rows=24790538 width=32) >> -> Index Scan using prog_data_pkey on prog_data >>(cost=0.00..701.12 rows=8872 width=28) >> -> Sort (cost=148864.65..148864.65 rows=921013 width=4) >> -> Seq Scan on prog_dgy_xy (cost=0.00..30145.13 >>rows=921013 width=4) >> > > I'm guessing that the approximately 25 million row estimate on the join > has to be wrong as well given that prog_data.prog_id should be unique. > > Hmm, does the explain change if you vacuum analyze the other table > (prog_data)? If not, what does explain show if you do a > set enable_seqscan='off'; > before it? The result: db=>set enable_seqscan='off'; db=>explain select distinct prog_id,prog_ftype,prog_fcasthour,prog_date from prog_dgy_xy,prog_data where pxygy_pid=prog_id; NOTICE: QUERY PLAN: Unique (cost=7606982.10..7854887.48 rows=2479054 width=32) -> Sort (cost=7606982.10..7606982.10 rows=24790538 width=32) -> Merge Join (cost=0.00..335621.73 rows=24790538 width=32) -> Index Scan using progdgyxy_idx2on prog_dgy_xy (cost=0.00..323297.05 rows=921013 width=4) -> Index Scan using prog_data_pkey on prog_data (cost=0.00..701.12 rows=8872 width=28) It "seems" index is used, but the same result :(((, and bigger execution time: real 3m41.830s What is in tables? prog_data contains unique id and other info. prog_dgy_xy contains that id with x,y coordinates (so many ids from prog_data with unique x,y) #prog_data: #prog_id, prog_ftype, prog_fcasthour, prog_date #1 'type' 6 2001-10-14 12:00:00 #2 'type' 12 2001-10-14 12:00:00 #prog_dgy_xy: #pxygy_pid, pxygy_x, pxygy_y #1 0.1 0.1 #1 0.1 0.15 How can this query takes real 0m1.755s for mysql, [17 sec for oracle], and 2-3 minutes!! for postgres? And why: POSTGRES: set enable_seqscan ='off'; select count(*) from prog_dgy_xy where pxygy_pid<13161; count -------- 900029 real 2m34.340s explain: Aggregate (cost=327896.89..327896.89 rows=1 width=0) -> Index Scan using progdgyxy_idx2 on prog_dgy_xy (cost=0.00..325594.54 rows=920940 width=0) MYSQL: select count(pxygy_pid) from PROG_DGY_XY where pxygy_pid<13161 count(pxygy_pid) 900029 real 0m27.878s explain: table type possible_keys key key_len ref rows Extra PROG_DGY_XY range progdgyxy_idx1,progdgyxy_idx2 progdgyxy_idx2 4 NULL 906856 where used; Using index The same time difference in case of: = or >, however explain says, cause seq scan is off, the index is used. I did vacuum, and vacuum analyze too before. PS: I think i have to make a site for that, cause there are many questions :), and weird things. I love postgres but this makes me "hm?". Today i'll make these test under 7.1.2. thx CoL