Thread: Best way to check if a table is empty

Best way to check if a table is empty

From
Marcelo Fernandes
Date:
Hi folks,

I came up with three strategies to verify whether a table is empty.

I wanted to sound the community to check whether my assumptions are correct for
each of these strategies, and to also discuss which strategy is best.

## Strategy 1 [possibly best?]

SELECT EXISTS (SELECT 1 FROM foo LIMIT 1);

Pros:
  1. Works with any table.
  2. Relatively fast (if the table is well organised).

Cons:
  1. Sequential Scan
  2. If the table is bloated, it reads more buffers.

## Strategy 2

SELECT min(id) FROM foo;

Pros:
  1. Does an index-only scan on a field that presumably has a PK index.
  2. Works well even if the table is bloated.

Cons:
  1. Sequential Scan if the table does not have a PK index.
  2. Reads a few more buffers than Strategy 1 when the table is well organised.
  3. Performs worse if the index is bloated.

## Strategy 3 [worst]

SELECT count(*) FROM foo;

Pros:
  1. Uses a widespread and intuitive operation (count)

Cons:
  1. Very slow on large tables as it performs a Sequential Scan.


How does all of that sound? Are there further strategies I should consider?
Anything I have missed in the Strategies above?

Regards,
Marcelo.



Re: Best way to check if a table is empty

From
"David G. Johnston"
Date:
On Sunday, March 23, 2025, Marcelo Fernandes <marcefern7@gmail.com> wrote:
Hi folks,

I came up with three strategies to verify whether a table is empty.

3 is strictly terrible worse to answer “is live row count > 0”.

Using an index likely serves no/negative benefit since it contains no tuple liveness information and you now are doing more buffers and lookups (IOW, your claims about pro/con for min(id) require assumptions you haven’t stipulated apply here).  Maybe IOS helps though I do wonder whether a sequential scan skips over known all-dead pages making that relative benefit go away.

So, absent data and an idea of why, just start looking at heap pages until you find a live tuple, then stop.  Exists already optimizes for early stop, the limit is pointless.

Checking table statistics first is probably a net-positive for many use cases.

But maybe rethink your data and processing models if doing this check on large bloated tables is what your existing choices have led to.

David J.

Re: Best way to check if a table is empty

From
Christophe Pettus
Date:

> On Mar 23, 2025, at 20:42, Marcelo Fernandes <marcefern7@gmail.com> wrote:
> Cons:
>  1. Sequential Scan
>  2. If the table is bloated, it reads more buffers.

These concerns probably sound worse than they really are.  Assuming the table is being vacuumed reliably, and is
receivinginserts, those inserts will tend to be at the start of the table, and so you'll hit a live tuple pretty fast.
Thereare pathological cases where it will take a long time (someone just delete a DELETE FROM t; from a huge table and
theentire thing is dead tuples but no free space), but that's relatively unlikely to happen in a real-world system. 

> Pros:
>  1. Does an index-only scan on a field that presumably has a PK index.
>  2. Works well even if the table is bloated.

#1 is not necessarily true.  You're never guaranteed an index-only scan; PostgreSQL may decide that the visibility map
informationindicates that a regular index scan (which means heap fetches to check visibility) will still happen. 

#2 is not necessarily true either; it's pretty much false in the same situation that a sequential scan for the first
tuplewill be slow.  Indexes get bloated, too, and if there are a large number of dead unreclaimed tuples, there are
alsogoing to be a lot of dead index entries for it to walk over. 

(And to be pedantic, #1 returns TRUE / FALSE while #2 returns <int> / NULL, so they aren't exactly equivalent.  If you
areable to handle <int> / NULL, you don't need the EXISTS clause.) 

> SELECT count(*) FROM foo;

There's no situation in which this will be faster than #1 or #2.


Re: Best way to check if a table is empty

From
"David G. Johnston"
Date:
On Sunday, March 23, 2025, David G. Johnston <david.g.johnston@gmail.com> wrote:

Maybe IOS helps though I do wonder whether a sequential scan skips over known all-dead pages making that relative benefit go away.

Well, no, since it tracks known visible, not known non-visible, though for something like a fast exists test with no where clause if the VM had any 1 bits the answer to the exists would be yes.  No idea if we take that shortcut.

David J.

Re: Best way to check if a table is empty

From
Christophe Pettus
Date:

> On Mar 23, 2025, at 21:15, David G. Johnston <david.g.johnston@gmail.com> wrote:
>
> No idea if we take that shortcut.

I remember looking into that not too long ago, and the answer's no.




Re: Best way to check if a table is empty

From
"David G. Johnston"
Date:
On Sunday, March 23, 2025, Christophe Pettus <xof@thebuild.com> wrote:

(And to be pedantic, #1 returns TRUE / FALSE while #2 returns <int> / NULL, so they aren't exactly equivalent.  If you are able to handle <int> / NULL, you don't need the EXISTS clause.)

Leaving the limit in place, without exists #1 returns either an empty set or an <int>, never NULL.

Always returning a Boolean seems like a better API choice though; but standardizing on that final transform doesn’t change the base comparison.

David J.