Re: Scrub one large table against another - Mailing list pgsql-performance

From Brendan Curran
Subject Re: Scrub one large table against another
Date
Msg-id 452E83D0.5060309@gmail.com
Whole thread Raw
In response to Re: Scrub one large table against another  ("Jim C. Nasby" <jim@nasby.net>)
Responses Re: Scrub one large table against another
List pgsql-performance
> Well, IN != EXISTS != JOIN. Exists just stops as soon as it finds a
> record. For some cases, it's equivalent to IN, but not all. IN has to
> de-duplicate it's list in some fashion. For small IN lists, you can do
> this with an OR, but at some point you need to switch to an actual
> unique (actually, I suspect the difference in PostgreSQL just depends on
> if you passed values into IN or a subquery). A join on the other hand
> doesn't worry about duplicates at all. There may be some brains in the
> planner that realize if a subquery will return a unique set (ie: you're
> querying on a primary key).
>

I agree, and it makes sense now that I consider it that IN would force the planner to implement some
form of unique check - possibly leveraging a PK or unique index if one is already available. Maybe
I'll tack up a note to the online documentation letting people know so that it's a little more
explicitly clear that when you choose IN on data that isn't explicitly unique (to the planner i.e.
post-analyze) you get the baggage of a forced unique whether you need it or not. Or perhaps someone
that knows the internals of the planner a little better than me should put some info up regarding that?

>
>> Just one more thing... I have found that maintaining a btree index on a
>> varchar(255) value is extremely expensive on insert/update/delete. It is
>> unfortunately necessary for me to maintain this index for queries and
>> reports so I am transitioning to using an unindexed staging table to
>> import data into before merging it with the larger table. All the docs
>> and posts recommend is to drop the index, import your data, and then
>> create the index again. This is untenable on a daily / bi-weekly basis.
>> Is there a more elegant solution to this indexing problem?
>
> You might be happier with tsearch than a regular index.

Thanks, I'll look into using tsearch2 as a possibility. From what I've seen so far it would add
quite a bit of complexity (necessary updates after inserts, proprietary query syntax that might
require a large amount of specialization from client apps) but in the end the overhead may be less
than that of maintaining the btree.

Thanks and Regards,
B

pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: FW: Simple join optimized badly?
Next
From: Tom Lane
Date:
Subject: Re: Scrub one large table against another