Thread: Why LIMIT after scanning the table?
I'm doing something where I just need to know if we have more than 100 rows in a table. Not wanting to scan the whole table, I thought I'd get cute... explain select count(*) FROM (SELECT * FROM email_rank WHERE project_id = :ProjectID LIMIT 100) AS t1; QUERY PLAN ------------------------------------------------------------------------------------- Aggregate (cost=111.32..111.32 rows=1 width=48) -> Subquery Scan t1 (cost=0.00..111.07 rows=100 width=48) -> Limit (cost=0.00..111.07 rows=100 width=48) -> Seq Scan on email_rank (cost=0.00..76017.40 rows=68439 width=48) Filter: (project_id = 24) The idea is that the inner-most query would only read the first 100 rows it finds, then stop. Instead, if explain is to be believed (and speed testing seems to indicate it's accurate), we'll read the entire table, *then* pick the first 100 rows. Why is that? FYI... Table "public.email_rank" Column | Type | Modifiers -----------------------+---------+-------------------- project_id | integer | not null id | integer | not null first_date | date | not null last_date | date | not null day_rank | integer | not null default 0 day_rank_previous | integer | not null default 0 overall_rank | integer | not null default 0 overall_rank_previous | integer | not null default 0 work_today | bigint | not null default 0 work_total | bigint | not null default 0 Indexes: email_rank_pkey primary key btree (project_id, id), email_rank__day_rank btree (project_id, day_rank), email_rank__overall_rank btree (project_id, overall_rank) -- Jim C. Nasby (aka Decibel!) jim@nasby.net Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On Wed, 30 Apr 2003, Jim C. Nasby wrote: > I'm doing something where I just need to know if we have more than 100 > rows in a table. Not wanting to scan the whole table, I thought I'd get > cute... > > explain select count(*) > FROM (SELECT * FROM email_rank WHERE project_id = :ProjectID LIMIT 100) AS t1; > QUERY PLAN > ------------------------------------------------------------------------------------- > Aggregate (cost=111.32..111.32 rows=1 width=48) > -> Subquery Scan t1 (cost=0.00..111.07 rows=100 width=48) > -> Limit (cost=0.00..111.07 rows=100 width=48) > -> Seq Scan on email_rank (cost=0.00..76017.40 rows=68439 width=48) > Filter: (project_id = 24) > > The idea is that the inner-most query would only read the first 100 rows > it finds, then stop. Instead, if explain is to be believed (and speed > testing seems to indicate it's accurate), we'll read the entire table, > *then* pick the first 100 rows. Why is that? I'd suggest looking at explain analyze rather than explain. In most cases I've seen what it'll actually grab is limit+1 rows (I think cvs will only grab limit) in the actual rows. It shows you the full count for the sequence scan in explain, but notice that the limit cost is lower than that of the sequence scan.
"Jim C. Nasby" <jim@nasby.net> writes: > explain select count(*) > FROM (SELECT * FROM email_rank WHERE project_id = :ProjectID LIMIT 100) AS t1; > The idea is that the inner-most query would only read the first 100 rows > it finds, then stop. Instead, if explain is to be believed (and speed > testing seems to indicate it's accurate), we'll read the entire table, > *then* pick the first 100 rows. Why is that? You're misreading the EXPLAIN output. Try EXPLAIN ANALYZE to see how many rows really get fetched. regards, tom lane
If you only what to know if there is more than 100 rows, why not do: if exists ( SELECT 1 FROM email_rank WHERE project_id = :ProjectID OFFSET 100 LIMIT 1 ) "Jim C. Nasby" wrote: > > I'm doing something where I just need to know if we have more than 100 > rows in a table. Not wanting to scan the whole table, I thought I'd get > cute... > > explain select count(*) > FROM () AS t1; > QUERY PLAN > ------------------------------------------------------------------------------------- > Aggregate (cost=111.32..111.32 rows=1 width=48) > -> Subquery Scan t1 (cost=0.00..111.07 rows=100 width=48) > -> Limit (cost=0.00..111.07 rows=100 width=48) > -> Seq Scan on email_rank (cost=0.00..76017.40 rows=68439 width=48) > Filter: (project_id = 24) > > The idea is that the inner-most query would only read the first 100 rows > it finds, then stop. Instead, if explain is to be believed (and speed > testing seems to indicate it's accurate), we'll read the entire table, > *then* pick the first 100 rows. Why is that? > > FYI... > > Table "public.email_rank" > Column | Type | Modifiers > -----------------------+---------+-------------------- > project_id | integer | not null > id | integer | not null > first_date | date | not null > last_date | date | not null > day_rank | integer | not null default 0 > day_rank_previous | integer | not null default 0 > overall_rank | integer | not null default 0 > overall_rank_previous | integer | not null default 0 > work_today | bigint | not null default 0 > work_total | bigint | not null default 0 > Indexes: email_rank_pkey primary key btree (project_id, id), > email_rank__day_rank btree (project_id, day_rank), > email_rank__overall_rank btree (project_id, overall_rank) > > -- > Jim C. Nasby (aka Decibel!) jim@nasby.net > Member: Triangle Fraternity, Sports Car Club of America > Give your computer some brain candy! www.distributed.net Team #1828 > > Windows: "Where do you want to go today?" > Linux: "Where do you want to go tomorrow?" > FreeBSD: "Are you guys coming, or what?" > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org