Sunday, July 3, 2022

B-Tree index deduplication enabled by default since PostgreSQL 13

 PostgreSQL B-Tree indexes are multi-level tree structures, where each level of the tree can be used as a doubly-linked list of pages. Each leaf page contains tuples that point to table rows.

A new feature introduced in PostgreSQL 13 called "deduplication". A duplicate is a leaf page tuple (a tuple that points to a table row) where all indexed key columns have values that match corresponding column values from at least one other leaf page tuple in the same index. More information refers to official documentation: https://www.postgresql.org/docs/13/btree-implementation.html#BTREE-DEDUPLICATION

Use "deduplicate_items" to explicitly disable deduplicate for write-heavy workload with no duplication to avoid performance overheads associated with de-duplication logic, which is default to enabled.

Demo Setup

create table salesorder (
    id          integer, 
    salerep     varchar(10), 
    order_date  date,
    amount numeric(10,2)
);
insert into salesorder 
select generate_series, 
       'salesrep'||mod(generate_series,100),
        concat(generate_series, ' day')::interval+current_date,
        round((random()*100)::numeric,2)
from generate_series(1, 1000000);
vacuum analyze salesorder;

Create indexes

create index id_dedup_on on salesorder(id);
create index id_dedup_off on salesorder(id)  WITH (deduplicate_items = off);
create index salerep_dedup_on on salesorder(salerep);
create index salerep_dedup_off on salesorder(salerep)  WITH (deduplicate_items = off);

Compare table and index sizes

mytest=> \d+
                                     List of relations
 Schema |    Name    | Type  |  Owner   | Persistence | Access method | Size  | Description
--------+------------+-------+----------+-------------+---------------+-------+-------------
 public | salesorder | table | postgres | permanent   | heap          | 57 MB |
(1 row)

mytest=> \di+
                                                List of relations
 Schema |       Name        | Type  |  Owner   |   Table    | Persistence | Access method |  Size   | Description
--------+-------------------+-------+----------+------------+-------------+---------------+---------+-------------
 public | id_dedup_off      | index | postgres | salesorder | permanent   | btree         | 21 MB   |
 public | id_dedup_on       | index | postgres | salesorder | permanent   | btree         | 21 MB   |
 public | salerep_dedup_off | index | postgres | salesorder | permanent   | btree         | 30 MB   |
 public | salerep_dedup_on  | index | postgres | salesorder | permanent   | btree         | 6912 kB |
(4 rows)

No comments:

Post a Comment