Home > mailing lists

Unique values across a table of arrays - documents and tags - Mailing list pgsql-performance

From	Ivan Voras
Subject	Unique values across a table of arrays - documents and tags
Date	November 7, 2012 15:22:14
Msg-id	k7duab$fn4$1@ger.gmane.org Whole thread Raw
Responses	Re: Unique values across a table of arrays - documents and tags
List	pgsql-performance

Tree view

Hello,

I know I need to re-engineer this so it doesn't suck by design, so I'm
wondering if there is some nifty PostgreSQL feature or best practice
which may automagically do the best thing.

I store information about documents which are tagged by string tags. The
structure is very simple:

CREATE TABLE documents (
        id SERIAL NOT NULL,
        title TEXT NOT NULL,
        -- other fields --
        tags TEXT[] NOT NULL,
        flags INTEGER
);

Currently, I have a GIN index on the tags field, and it works for searching:

edem=> explain analyze select id,title,flags from documents where tags
@> ARRAY['tag'];
                                                      QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on documents  (cost=8.00..12.01 rows=1 width=39)
(actual time=0.067..0.086 rows=9 loops=1)
   Recheck Cond: (tags @> '{tag}'::text[])
   ->  Bitmap Index Scan on documents_tags  (cost=0.00..8.00 rows=1
width=0) (actual time=0.053..0.053 rows=9 loops=1)
         Index Cond: (tags @> '{tag}'::text[])
 Total runtime: 0.135 ms
(5 rows)


The other feature I need is a list of unique tags in all the documents,
e.g.:

edem=> explain analyze select distinct unnest(tags) as tag from documents;
                                                 QUERY PLAN

-------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=28.54..28.84 rows=24 width=42) (actual
time=0.261..0.307 rows=44 loops=1)
   ->  Seq Scan on documents  (cost=0.00..28.45 rows=36 width=42)
(actual time=0.020..0.157 rows=68 loops=1)
 Total runtime: 0.419 ms
(3 rows)

This is unfortunately slow (because I know the load will increase and
this will be a common operation).

The thing I was planning to do is create a separate table, with only the
unique tags, and possibly an array of documents which have these tags,
which will be maintained with UPDATE and INSERT triggers on the
documents table, but then I remembered that the GIN index itself does
something not unlike this method. Is there a way to make use of this
information to get a list of unique tags?

Barring that, what would you suggest for efficiently handing a classic
structure like this (meaning documents with tags)?

Attachment

signature.asc

pgsql-performance by date:

From: Tom Lane
Date: 07 November 2012, 15:02:40
Subject: Re: [HACKERS] pg_dump and thousands of schemas

From: François Beausoleil
Date: 07 November 2012, 15:34:27
Subject: Re: Unique values across a table of arrays - documents and tags

Unique values across a table of arrays - documents and tags - Mailing list pgsql-performance

Attachment

Previous

Next