Retrieve PostgreSQL variable-length storage information thanks to pageinspect

Introduction

In PostgreSQL a variable-length datatype value can be stored in-line or out-of-line (as a TOAST). It can also be compressed or not (see the documentation for more details).

Let’s make use of the pageinspect extension and the information about variable-length datatype found in postgres.h to build a query to retrieve tuples variable-length storage information.

The query

The query is the following:

$ cat toast_info.sql
select
t_ctid,
-- See postgres.h
CASE
  WHEN (fo is NULL) THEN 'null'
  -- VARATT_IS_EXTERNAL_ONDISK: VARATT_IS_EXTERNAL (fo = 'x01') && tag == VARTAG_ONDISK (x12)
  -- rawsize - VARHDRSZ > extsize
  WHEN (fo = 'x01') AND (tag = 'x12') AND (osize - 4 > ssize) THEN 'toasted (compressed)'
-- rawsize - VARHDRSZ <= extsize
  WHEN (fo = 'x01') AND (tag = 'x12') AND (osize - 4 <= ssize) THEN 'toasted (uncompressed)'
  -- VARATT_IS_EXTERNAL_INDIRECT: VARATT_IS_EXTERNAL && tag == VARTAG_INDIRECT (x01)
  WHEN (fo = 'x01') AND (tag = 'x01') then 'indirect in-memory'
  -- VARATT_IS_EXTERNAL_EXPANDED: VARATT_IS_EXTERNAL && VARTAG_IS_EXPANDED(VARTAG_EXTERNAL)
  WHEN (fo = 'x01') AND (tag = 'x02' OR tag = 'x03') then 'expanded in-memory'
  -- VARATT_IS_SHORT (va_header & 0x01) == 0x01)
  WHEN (fo & 'x01' = 'x01') THEN 'short in-line'
  -- VARATT_IS_COMPRESSED (va_header & 0x03) == 0x02)
  WHEN (fo & 'x03' = 'x02') THEN 'long in-line (compressed)'
  ELSE 'long in-line (uncompressed)'
END as toast_info
from
(
select
page_items.t_ctid,
substr(page_items.t_attrs[1]::text,2,3)::bit(8) as fo,
('x'||substr(page_items.t_attrs[1]::text,5,2))::bit(8) as tag,
('x'||regexp_replace(substr(page_items.t_attrs[1]::text,7,8),'(\w\w)(\w\w)(\w\w)(\w\w)','\4\3\2\1'))::bit(32)::int as osize ,
('x'||regexp_replace(substr(page_items.t_attrs[1]::text,15,8),'(\w\w)(\w\w)(\w\w)(\w\w)','\4\3\2\1'))::bit(32)::int as ssize
from
generate_series(0, pg_relation_size('bdttoast'::regclass::text) / 8192 - 1) blkno ,
heap_page_item_attrs(get_raw_page('bdttoast',blkno::int), 'bdttoast'::regclass) as page_items
) as hp;

As you can see, the query focus on the bdttoast table. Let’s create this table and put some data in it to see the query in action.

Let’s see the query in action

Let’s create this table:

postgres=# CREATE TABLE bdttoast ( message text );
CREATE TABLE

the message field storage type is “extended”:

postgres=# \d+ bdttoast
                                 Table "public.bdttoast"
 Column  | Type | Collation | Nullable | Default | Storage  | Stats target | Description
---------+------+-----------+----------+---------+----------+--------------+-------------
 message | text |           |          |         | extended |              |

extended allows both compression and out-of-line storage. This is the default for most TOAST-able data types. Compression will be attempted first, then out-of-line storage if the row is still too big.

let’s add one tuple:

postgres=# INSERT INTO bdttoast VALUES ('default');
INSERT 0 1

and check how the message field has been stored thanks to the query:

postgres=# \i toast_info.sql
 t_ctid |  toast_info
--------+---------------
 (0,1)  | short in-line
(1 row)

as you can see it has been stored in-line.

Add another tuple with more data in the message field, and check its storage information:

postgres=# INSERT INTO bdttoast VALUES (repeat('a',10000));
INSERT 0 1
postgres=# \i toast_info.sql
 t_ctid |        toast_info
--------+---------------------------
 (0,1)  | short in-line
 (0,2)  | long in-line (compressed)
(2 rows)

as you can see this field value has been stored as long in-line and is compressed.

Add another tuple with even more data in the message field, and check its storage information:

postgres=# INSERT INTO bdttoast VALUES (repeat('b',1000000));
INSERT 0 1
postgres=# \i toast_info.sql
 t_ctid |        toast_info
--------+---------------------------
 (0,1)  | short in-line
 (0,2)  | long in-line (compressed)
 (0,3)  | toasted (compressed)
(3 rows)

this time it has been stored as TOAST-ed and is compressed.

Let’s change the message column storage to external:

postgres=# ALTER TABLE bdttoast ALTER COLUMN message SET STORAGE EXTERNAL;
ALTER TABLE
postgres=# \d+ bdttoast
                                 Table "public.bdttoast"
 Column  | Type | Collation | Nullable | Default | Storage  | Stats target | Description
---------+------+-----------+----------+---------+----------+--------------+-------------
 message | text |           |          |         | external |              |

external means it allows out-of-line storage but not compression.

Now, let’s add 3 tuples with the same field message size as the 3 ones previously added and compare the storage information.

Let’s add one tuple with the same message size as the one previously stored as “short in-line”:

postgres=# INSERT INTO bdttoast VALUES ('externa');
INSERT 0 1
postgres=# \i toast_info.sql
 t_ctid |        toast_info
--------+---------------------------
 (0,1)  | short in-line
 (0,2)  | long in-line (compressed)
 (0,3)  | toasted (compressed)
 (0,4)  | short in-line
(4 rows)

this one is still short in-line (same as t_ctid (0,1)).

Let’s add one tuple with the same message size as the one previously stored as “long in-line (compressed)”:

postgres=# INSERT INTO bdttoast VALUES (repeat('c',10000));
INSERT 0 1
postgres=# \i toast_info.sql
 t_ctid |        toast_info
--------+---------------------------
 (0,1)  | short in-line
 (0,2)  | long in-line (compressed)
 (0,3)  | toasted (compressed)
 (0,4)  | short in-line
 (0,5)  | toasted (uncompressed)
(5 rows)

this one is TOAST-ed and uncompressed with storage external (as compare with t_ctid (0,2) with storage extended).

Let’s add one tuple with the same message size as the one previously stored as “toasted (compressed)”:

postgres=# INSERT INTO bdttoast VALUES (repeat('d',1000000));
INSERT 0 1
postgres=# \i toast_info.sql
 t_ctid |        toast_info
--------+---------------------------
 (0,1)  | short in-line
 (0,2)  | long in-line (compressed)
 (0,3)  | toasted (compressed)
 (0,4)  | short in-line
 (0,5)  | toasted (uncompressed)
 (0,6)  | toasted (uncompressed)
(6 rows)

this one is TOAST-ed and uncompressed with storage external (as compare with t_ctid (0,3) with storage extended).

Remarks

  • use this query on little-endian machines only (bit layouts would not be the same on big-endian and would impact the query accuracy)
  • t_attrs[1] is used in the query to retrieve the information. This is because the message field is the 1st of the relation

Conclusion

Thanks to the pageinspect extension we have been able to write a query to retrieve variable-length storage information. We have been able to compare how our data has been stored depending on the column storage being used (extended or external).

Get toast chunk_id from the user table tuples or from the toast index thanks to pageinspect

Introduction

TOAST stands for “The Oversized-Attribute Storage Technique” and allows to broke up large fields value into multiple physical rows (see the PostgreSQL documentation for more details).

The goal of this post is to provide a way to retrieve toast’s information from the user table tuples or from the toast index without querying the toast directly.

We will be able to link the user table tuples to the toast pages by using user table tuples and toast index data (not querying the toast at all).

Build the playground

Create a table with a TOAST-able field:

toasted=# CREATE TABLE bdttab (id int,message text,age int);
CREATE TABLE

toasted=# \d+ bdttab
                                   Table "public.bdttab"
 Column  |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description
---------+---------+-----------+----------+---------+----------+--------------+-------------
 id      | integer |           |          |         | plain    |              |
 message | text    |           |          |         | extended |              |
 age     | integer |           |          |         | plain    |              |
Access method: heap

Insert 10 rows:

toasted=# INSERT INTO bdttab
SELECT 1,(SELECT
string_agg(chr(floor(random() * 26)::int + 65), '')
FROM generate_series(1,10000)),6
FROM generate_series(1,10);
INSERT 0 10

check the toast and toast index names:

toasted=# select r.relname,t.relname as toast,i.relname as toast_index from pg_class r, pg_class i, pg_index d, pg_class t where r.relname = 'bdttab' and d.indrelid = r.reltoastrelid and i.oid = d.indexrelid and t.oid = r.reltoastrelid;
relname | toast | toast_index
---------+-----------------+-----------------------
bdttab | pg_toast_192881 | pg_toast_192881_index
(1 row)

Retrieve the chunk_id for each tuple directly from the user table

The chunk_id is part of the tuple’s data as explained in this slide (coming from this presentation):

Note that the toast relation id is also part of the tuple data.

As you can see from the slide we can get the information with pageinspect, so let’s use it to build a query to retrieve the chunk_id and the toast relation id from the tuples:

create the extension:

toasted=# create extension pageinspect;
CREATE EXTENSION

and query the user table to get the information:

toasted=# select
page_item_attrs.t_ctid,
page_item_attrs.t_attrs[2],
substr(substr(page_item_attrs.t_attrs[2],octet_length(page_item_attrs.t_attrs[2])-7,4)::text,3) as substr_for_chunk_id,
('x'||regexp_replace(substr(substr(page_item_attrs.t_attrs[2],octet_length(page_item_attrs.t_attrs[2])-7,4)::text,3),'(\w\w)(\w\w)(\w\w)(\w\w)','\4\3\2\1'))::bit(32)::int as chunk_id,
substr(substr(page_item_attrs.t_attrs[2],octet_length(page_item_attrs.t_attrs[2])-3,4)::text,3) as substr_for_toast_relid,
('x'||regexp_replace(substr(substr(page_item_attrs.t_attrs[2],octet_length(page_item_attrs.t_attrs[2])-3,4)::text,3),'(\w\w)(\w\w)(\w\w)(\w\w)','\4\3\2\1'))::bit(32)::int as toast_relid
FROM
generate_series(0, pg_relation_size('bdttab'::regclass::text) / 8192 - 1) blkno ,
heap_page_item_attrs(get_raw_page('bdttab', blkno::int), 'bdttab'::regclass) as page_item_attrs
where
substr(page_item_attrs.t_attrs[2]::text,3,2)='01';
 t_ctid |                t_attrs                 | substr_for_chunk_id | chunk_id | substr_for_toast_relid | toast_relid
--------+----------------------------------------+---------------------+----------+------------------------+-------------
 (0,1)  | \x0112142700001027000077f1020074f10200 | 77f10200            |   192887 | 74f10200               |      192884
 (0,2)  | \x0112142700001027000078f1020074f10200 | 78f10200            |   192888 | 74f10200               |      192884
 (0,3)  | \x0112142700001027000079f1020074f10200 | 79f10200            |   192889 | 74f10200               |      192884
 (0,4)  | \x011214270000102700007af1020074f10200 | 7af10200            |   192890 | 74f10200               |      192884
 (0,5)  | \x011214270000102700007bf1020074f10200 | 7bf10200            |   192891 | 74f10200               |      192884
 (0,6)  | \x011214270000102700007cf1020074f10200 | 7cf10200            |   192892 | 74f10200               |      192884
 (0,7)  | \x011214270000102700007df1020074f10200 | 7df10200            |   192893 | 74f10200               |      192884
 (0,8)  | \x011214270000102700007ef1020074f10200 | 7ef10200            |   192894 | 74f10200               |      192884
 (0,9)  | \x011214270000102700007ff1020074f10200 | 7ff10200            |   192895 | 74f10200               |      192884
 (0,10) | \x0112142700001027000080f1020074f10200 | 80f10200            |   192896 | 74f10200               |      192884
(10 rows)

as you can see we are able to get the chunk_id and the toast relation id from the tuples.

Let’s verify that those values make sense, first checking the toast relation id:

toasted=# select relname from pg_class where oid=192884;
     relname
-----------------
 pg_toast_192881
(1 row)

and then checking the chunk_ids from the toast itself:

toasted=# select distinct(chunk_id) from pg_toast.pg_toast_192881;
 chunk_id
----------
   192894
   192895
   192889
   192888
   192893
   192892
   192896
   192891
   192887
   192890
(10 rows)

The values match the ones we got directly from the user table tuples.

Retrieve the chunk_id and chunk_seq directly from the toast index

The toast index contains those information, so let’s query it too:

toasted=# select
page_items.ctid,
page_items.data,
('x'||regexp_replace(substr(page_items.data,1,11),'(\w\w) (\w\w) (\w\w) (\w\w)','\4\3\2\1'))::bit(32)::int as chunk_id,
('x'||regexp_replace(substr(page_items.data,13,23),'(\w\w) (\w\w) (\w\w) (\w\w)','\4\3\2\1'))::bit(32)::int as chunk_seq
FROM
generate_series(1, pg_relation_size('pg_toast.pg_toast_192881_index'::regclass::text) / 8192 - 1) blkno ,
bt_page_items('pg_toast.pg_toast_192881_index', blkno::int) as page_items;

  ctid  |          data           | chunk_id | chunk_seq
--------+-------------------------+----------+-----------
 (0,1)  | 77 f1 02 00 00 00 00 00 |   192887 |         0
 (0,2)  | 77 f1 02 00 01 00 00 00 |   192887 |         1
 (0,3)  | 77 f1 02 00 02 00 00 00 |   192887 |         2
 (0,4)  | 77 f1 02 00 03 00 00 00 |   192887 |         3
 (1,1)  | 77 f1 02 00 04 00 00 00 |   192887 |         4
 (1,2)  | 77 f1 02 00 05 00 00 00 |   192887 |         5
 (1,3)  | 78 f1 02 00 00 00 00 00 |   192888 |         0
 (1,4)  | 78 f1 02 00 01 00 00 00 |   192888 |         1
 (2,1)  | 78 f1 02 00 02 00 00 00 |   192888 |         2
 (2,2)  | 78 f1 02 00 03 00 00 00 |   192888 |         3
 (2,3)  | 78 f1 02 00 04 00 00 00 |   192888 |         4
 (2,4)  | 78 f1 02 00 05 00 00 00 |   192888 |         5
 (3,1)  | 79 f1 02 00 00 00 00 00 |   192889 |         0
 (3,2)  | 79 f1 02 00 01 00 00 00 |   192889 |         1
 (3,3)  | 79 f1 02 00 02 00 00 00 |   192889 |         2
 (3,4)  | 79 f1 02 00 03 00 00 00 |   192889 |         3
 (4,1)  | 79 f1 02 00 04 00 00 00 |   192889 |         4
 (4,2)  | 79 f1 02 00 05 00 00 00 |   192889 |         5
 (4,3)  | 7a f1 02 00 00 00 00 00 |   192890 |         0
 (4,4)  | 7a f1 02 00 01 00 00 00 |   192890 |         1
 (5,1)  | 7a f1 02 00 02 00 00 00 |   192890 |         2
 (5,2)  | 7a f1 02 00 03 00 00 00 |   192890 |         3
.
.
.
 (12,3) | 7f f1 02 00 02 00 00 00 |   192895 |         2
 (12,4) | 7f f1 02 00 03 00 00 00 |   192895 |         3
 (13,1) | 7f f1 02 00 04 00 00 00 |   192895 |         4
 (13,2) | 7f f1 02 00 05 00 00 00 |   192895 |         5
 (13,3) | 80 f1 02 00 00 00 00 00 |   192896 |         0
 (13,4) | 80 f1 02 00 01 00 00 00 |   192896 |         1
 (14,1) | 80 f1 02 00 02 00 00 00 |   192896 |         2
 (14,2) | 80 f1 02 00 03 00 00 00 |   192896 |         3
 (14,3) | 80 f1 02 00 04 00 00 00 |   192896 |         4
 (14,4) | 80 f1 02 00 05 00 00 00 |   192896 |         5
(60 rows)

Note that the chunk_ids coming from the index, the user table tuples and the toast itself match.

Use case example: a toast’s page is corrupted and I want to know which tuples from the user table are affected

Say the page 12 of the toast is corrupted, then we could get the affected chunk_id and chunk_seq from the toast index that way (by filtering on the ctid):

toasted=# select
page_items.ctid,
page_items.data,
('x'||regexp_replace(substr(page_items.data,1,11),'(\w\w) (\w\w) (\w\w) (\w\w)','\4\3\2\1'))::bit(32)::int as chunk_id,
('x'||regexp_replace(substr(page_items.data,13,23),'(\w\w) (\w\w) (\w\w) (\w\w)','\4\3\2\1'))::bit(32)::int as chunk_seq
FROM
generate_series(1, pg_relation_size('pg_toast.pg_toast_192881_index'::regclass::text) / 8192 - 1) blkno ,
bt_page_items('pg_toast.pg_toast_192881_index', blkno::int) as page_items where ctid::text like '(12,%';

  ctid  |          data           | chunk_id | chunk_seq
--------+-------------------------+----------+-----------
 (12,1) | 7f f1 02 00 00 00 00 00 |   192895 |         0
 (12,2) | 7f f1 02 00 01 00 00 00 |   192895 |         1
 (12,3) | 7f f1 02 00 02 00 00 00 |   192895 |         2
 (12,4) | 7f f1 02 00 03 00 00 00 |   192895 |         3
(4 rows)

And then look back at the user table tuples that way by filtering on the chunk_id:

toasted=# select
page_item_attrs.t_ctid,
page_item_attrs.t_attrs[2],
substr(substr(page_item_attrs.t_attrs[2],octet_length(page_item_attrs.t_attrs[2])-7,4)::text,3) as substr_for_chunk_id,
('x'||regexp_replace(substr(substr(page_item_attrs.t_attrs[2],octet_length(page_item_attrs.t_attrs[2])-7,4)::text,3),'(\w\w)(\w\w)(\w\w)(\w\w)','\4\3\2\1'))::bit(32)::int as chunk_id,
substr(substr(page_item_attrs.t_attrs[2],octet_length(page_item_attrs.t_attrs[2])-3,4)::text,3) as substr_for_toast_relid,
('x'||regexp_replace(substr(substr(page_item_attrs.t_attrs[2],octet_length(page_item_attrs.t_attrs[2])-3,4)::text,3),'(\w\w)(\w\w)(\w\w)(\w\w)','\4\3\2\1'))::bit(32)::int as toast_relid
FROM
generate_series(0, pg_relation_size('bdttab'::regclass::text) / 8192 - 1) blkno ,
heap_page_item_attrs(get_raw_page('bdttab', blkno::int), 'bdttab'::regclass) as page_item_attrs
where
substr(page_item_attrs.t_attrs[2]::text,3,2)='01' and ('x'||regexp_replace(substr(substr(page_item_attrs.t_attrs[2],octet_length(page_item_attrs.t_attrs[2])-7,4)::text,3),'(\w\w)(\w\w)(\w\w)(\w\w)','\4\3\2\1'))::bit(32)::int = 192895;

 t_ctid |                t_attrs                 | substr_for_chunk_id | chunk_id | substr_for_toast_relid | toast_relid
--------+----------------------------------------+---------------------+----------+------------------------+-------------
 (0,9)  | \x011214270000102700007ff1020074f10200 | 7ff10200            |   192895 | 74f10200               |      192884
(1 row)

so that we know that this tuple with ctid (0,9) is linked to the corrupted toast page.

Remarks

  • t_attrs[2] is used in the query to retrieve the information from the user table. This is because the TOAST-able field (message) is the 2nd of the relation.
  • substr(page_item_attrs.t_attrs[2]::text,3,2)=’01’ is used to filter the user table tuples information. This is because 0x01 is used as TOASTed string marker (see the slide).

Conclusion

Thanks to pageinspect we have been able to retrieve the chunk_id directly from the user table tuples. We also have been able to link the user table tuples to the toast pages by using user table tuples and toast index data (not querying the toast at all).