Thread: Which query is less expensive / faster?
Hi all, I have a couple different ways to get results out of my table structure, but I'm unsure which way is faster or less expensive to run: Option 1: Query a table of 4 million rows, on 4 indexed columns. It will return 1 row: SELECT field1, field2, field3, field4 FROM tablea WHERE field1 = $1 AND field2 = $2 AND field3 = $3 AND field4 = $4 Option 2: Query a table of 200,000 rows on 1 indexed column. It will return 800 rows: SELECT * FROM tableb WHERE field1 = $1 Which one is going to return results the fastest, with the least expense to the database server?
I am not an expert to postgres internals but in General: if you have a btree multicolumn index on the 4 fields then it should take around logF(4M). lets guess the F to be 5 so its around 10 ios +1 block read. for the same thing for a hashtable its about the same or less. if you have any subset of the fields indexed with a btree it costs logF(4M) + all the blocks with those subset which is still better than a sequential scan. another possibility which requires careful analyze of the frequencies is intersecting all the rows from the 4 separate indices and finding 1 that matches. In any case, when in doubt run the EXPLAIN on your query. see the documentation. Regards, tzahi. > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of > Postgres Coder > Sent: Friday, February 25, 2005 1:46 AM > To: pgsql-general@postgresql.org > Subject: [GENERAL] Which query is less expensive / faster? > > > Hi all, > > I have a couple different ways to get results out of my table > structure, but I'm unsure which way is faster or less expensive to > run: > > Option 1: Query a table of 4 million rows, on 4 indexed > columns. It will return 1 row: > > SELECT field1, field2, field3, field4 > FROM tablea > WHERE field1 = $1 AND field2 = $2 AND field3 = $3 AND field4 = $4 > > Option 2: Query a table of 200,000 rows on 1 indexed column. > It will return 800 rows: > > SELECT * > FROM tableb > WHERE field1 = $1 > > Which one is going to return results the fastest, with the > least expense to the database server? > > ---------------------------(end of > broadcast)--------------------------- > TIP 8: explain analyze is your friend > >
Thanks for the reply... but which query will be faster and less expensive? I don't have a table now with 4 million rows, but I'm thinking of building such a table. Querying it would return 1 row. The alternative is to query an existing table of 200k rows, and return 800 rows. Option 1: Query a table of 4 million rows, on 4 indexed columns. It will return 1 row: SELECT field1, field2, field3, field4 FROM tablea WHERE field1 = $1 AND field2 = $2 AND field3 = $3 AND field4 = $4 Option 2: Query a table of 200,000 rows on 1 indexed column. It will return 800 rows: SELECT * FROM tableb WHERE field1 = $1 Which one is going to return results the fastest, with the least expense to the database server? -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Tzahi Fadida Sent: Thursday, February 24, 2005 5:18 PM To: 'Postgres Coder'; pgsql-general@postgresql.org Subject: Re: [GENERAL] Which query is less expensive / faster? I am not an expert to postgres internals but in General: if you have a btree multicolumn index on the 4 fields then it should take around logF(4M). lets guess the F to be 5 so its around 10 ios +1 block read. for the same thing for a hashtable its about the same or less. if you have any subset of the fields indexed with a btree it costs logF(4M) + all the blocks with those subset which is still better than a sequential scan. another possibility which requires careful analyze of the frequencies is intersecting all the rows from the 4 separate indices and finding 1 that matches. In any case, when in doubt run the EXPLAIN on your query. see the documentation. Regards, tzahi. > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Postgres > Coder > Sent: Friday, February 25, 2005 1:46 AM > To: pgsql-general@postgresql.org > Subject: [GENERAL] Which query is less expensive / faster? > > > Hi all, > > I have a couple different ways to get results out of my table > structure, but I'm unsure which way is faster or less expensive to > run: > > Option 1: Query a table of 4 million rows, on 4 indexed columns. It > will return 1 row: > > SELECT field1, field2, field3, field4 FROM tablea WHERE field1 = $1 > AND field2 = $2 AND field3 = $3 AND field4 = $4 > > Option 2: Query a table of 200,000 rows on 1 indexed column. > It will return 800 rows: > > SELECT * > FROM tableb > WHERE field1 = $1 > > Which one is going to return results the fastest, with the least > expense to the database server? > > ---------------------------(end of > broadcast)--------------------------- > TIP 8: explain analyze is your friend > > ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Jon Asher wrote: > Thanks for the reply... but which query will be faster and less expensive? > I don't have a table now with 4 million rows, but I'm thinking of building > such a table. Querying it would return 1 row. The alternative is to query > an existing table of 200k rows, and return 800 rows. > > Option 1: Query a table of 4 million rows, on 4 indexed columns. It > will return 1 row: > > SELECT field1, field2, field3, field4 FROM tablea WHERE field1 = $1 > AND field2 = $2 AND field3 = $3 AND field4 = $4 > > Option 2: Query a table of 200,000 rows on 1 indexed column. > It will return 800 rows: > > SELECT * > FROM tableb > WHERE field1 = $1 > > Which one is going to return results the fastest, with the least > expense to the database server? 30 seconds anyway.The answer can only be "that depends". Maybe option1 has very wide columns for indexing and a poor spread of values. Maybe option 2 has only three small columns, the indexed column is very discriminating and the whole table plus index fits in RAM. Maybe you have an array of 15,000rpm SCSI disks, maybe you have a laptop IDE drive, maybe the client is at the end of a 56k modem line and fetching 800 rows will take Of course, that's all without taking into account whatever else the database is doing, not to mention the rest of the machine. Now, in general fetching one row will be quicker, but without real information no-one can say. The best and quickest solution? Mock up some tables with test-data and try it with your actual setup. Make sure your test-data has a realistic distribution of values and number of rows. I got caught out a couple of weeks ago where one small and insignificant table crippled a big stats recalculation - all because my testing had focused on the typical case of a few dozen rows and not the occasional case of a thousand or so. -- Richard Huxton Archonet Ltd
Jon Asher wrote: > Thanks for the reply... but which query will be faster and less expensive? > I don't have a table now with 4 million rows, but I'm thinking of building > such a table. Querying it would return 1 row. The alternative is to query > an existing table of 200k rows, and return 800 rows. Actually, as a side-issue to my last reply... How does one row contain the same information as 800? -- Richard Huxton Archonet Ltd