Sunday, January 23, 2022

Performance and Storage comparisons between UUID and BigSerial (BigInt)

 UUID: Universally Unique Identifiers (UUID) 

·        Defined by RFC 4122, ISO/IEC 9834-8:2005, and related standards

·        Known as globally unique identifier, or GUID

·        Example: a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11

·        Length in Text: 36 characters

·        Storage in UUID: 32 digits = 128bits = 16bytes

 

Bigserial: 

·        Same as “SERIAL8”

·        Range : 1 - 9223372036854775807

·        “create table t (id bigserial)” same as:

o    create sequence t_id_seq;

o    create table t(id bigint not null default nextval('t_id_seq’));

o    alter sequence t_id_seq owned by t.id;


Install Extension:

create extension if not exists "uuid-ossp";


mytest=> \df

                                 List of functions

 Schema |        Name        | Result data type |    Argument data types    | Type

--------+--------------------+------------------+---------------------------+------

 public | uuid_generate_v1   | uuid             |                           | func

 public | uuid_generate_v1mc | uuid             |                           | func

 public | uuid_generate_v3   | uuid             | namespace uuid, name text | func

 public | uuid_generate_v4   | uuid             |                           | func

 public | uuid_generate_v5   | uuid             | namespace uuid, name text | func

 public | uuid_nil           | uuid             |                           | func

 public | uuid_ns_dns        | uuid             |                           | func

 public | uuid_ns_oid        | uuid             |                           | func

 public | uuid_ns_url        | uuid             |                           | func

 public | uuid_ns_x500       | uuid             |                           | func


Insert Performance:


create table t1(c1 uuid primary key, c2 char(100));

create table t2(c1 char(36) primary key, c2 char(100));

create table t3(c1 bigserial primary key, c2 char(100));


mytest=> insert into t1 select uuid_generate_v4(),'x' from generate_series(1,1000000);

INSERT 0 1000000

Time: 8165.939 ms (00:08.166)


mytest=> insert into t2 select uuid_generate_v4(),'x' from generate_series(1,1000000);

INSERT 0 1000000

Time: 12102.101 ms (00:12.102)


mytest=> insert into t3(c2) select 'x' from generate_series(1,1000000);

INSERT 0 1000000

Time: 5617.863 ms (00:05.618)

 

Storage Comparison:


mytest=> \di+

                                List of relations

 Schema |  Name   | Type  |  Owner   | Table | Persistence | Size  | Description

--------+---------+-------+----------+-------+-------------+-------+-------------

 public | t1_pkey | index | postgres | t1    | permanent   | 38 MB |

 public | t2_pkey | index | postgres | t2    | permanent   | 73 MB |

 public | t3_pkey | index | postgres | t3    | permanent   | 21 MB |

(3 rows)


select * from (

(select pg_column_size(c1) as "t1.c1" from t1 limit 1) as a1 cross join

(select pg_column_size(c1) as "t2.c1" from t2 limit 1) as a2 cross join

(select pg_column_size(c1) as "t3.c1" from t3 limit 1) as a3);

 t1.c1 | t2.c1 | t3.c1

-------+-------+-------

    16 |    37 |     8

 


Other Considerations:


       Large size could mean bloat, large WAL logs, more space in both storage and memory

       Randomization means more random IOs for the index, less efficient buffer cache as hot data spread evenly.

       Foreign Key (FK) Constraints usually reference to PK.

       Max(id) doesn’t make sense to you any more.