Active Session History in PostgreSQL: blocker and wait chain

While the active session history extension for PostgreSQL is still in beta, some information is added to it.

The pg_active_session_history view is currently made of:

                   View "public.pg_active_session_history"
      Column      |           Type           | Collation | Nullable | Default
------------------+--------------------------+-----------+----------+---------
 ash_time         | timestamp with time zone |           |          |
 datid            | oid                      |           |          |
 datname          | text                     |           |          |
 pid              | integer                  |           |          |
 usesysid         | oid                      |           |          |
 usename          | text                     |           |          |
 application_name | text                     |           |          |
 client_addr      | text                     |           |          |
 client_hostname  | text                     |           |          |
 client_port      | integer                  |           |          |
 backend_start    | timestamp with time zone |           |          |
 xact_start       | timestamp with time zone |           |          |
 query_start      | timestamp with time zone |           |          |
 state_change     | timestamp with time zone |           |          |
 wait_event_type  | text                     |           |          |
 wait_event       | text                     |           |          |
 state            | text                     |           |          |
 backend_xid      | xid                      |           |          |
 backend_xmin     | xid                      |           |          |
 top_level_query  | text                     |           |          |
 query            | text                     |           |          |
 cmdtype          | text                     |           |          |
 queryid          | bigint                   |           |          |
 backend_type     | text                     |           |          |
 blockers         | integer                  |           |          |
 blockerpid       | integer                  |           |          |
 blocker_state    | text                     |           |          |

You could see it as samplings of pg_stat_activity providing more information:

  • ash_time: the sampling time
  • top_level_query: the top level statement (in case PL/pgSQL is used)
  • query: the statement being executed (not normalised, as it is in pg_stat_statements, means you see the values)
  • cmdtype: the statement type (SELECT,UPDATE,INSERT,DELETE,UTILITY,UNKNOWN,NOTHING)
  • queryid: the queryid of the statement which links to pg_stat_statements
  • blockers: the number of blockers
  • blockerpid: the pid of the blocker (if blockers = 1), the pid of one blocker (if blockers > 1)
  • blocker_state: state of the blocker (state of the blockerpid)

Thanks to the queryid field you are able to link the session activity with the sql activity.

The information related to the blocking activity (if any) has been added recently. Why? To easily drill down in case of session being blocked.

Let’s see how we could display some interesting information in case of blocked session(s), for examples:

  • The wait chain
  • The seconds in wait in this chain
  • The percentage of the total wait time that this chain represents

As PostgreSQL provides recursive query and window functions, let’s make use of them to write this query:

postgres@pgu:~$ cat pg_ash_wait_chain.sql
WITH RECURSIVE search_wait_chain(ash_time,pid, blockerpid, wait_event_type,wait_event,level, path)
AS (
          SELECT ash_time,pid, blockerpid, wait_event_type,wait_event, 1 AS level,
          'pid:'||pid||' ('||wait_event_type||' : '||wait_event||') ->'||'pid:'||blockerpid AS path
          from pg_active_session_history WHERE blockers > 0
        union ALL
          SELECT p.ash_time,p.pid, p.blockerpid, p.wait_event_type,p.wait_event, swc.level + 1 AS level,
          'pid:'||p.pid||' ('||p.wait_event_type||' : '||p.wait_event||') ->'||swc.path AS path
          FROM pg_active_session_history p, search_wait_chain swc
          WHERE p.blockerpid = swc.pid and p.ash_time = swc.ash_time and p.blockers > 0
)
select round(100 * count(*) / cnt)||'%' as "% of total wait",count(*) as seconds,path as wait_chain  from (
        SELECT  pid,wait_event,path,sum(count) over() as cnt from (
                select ash_time,level,pid,wait_event,path,count(*) as count, max(level) over(partition by ash_time,pid) as max_level
                FROM search_wait_chain where level > 0 group by ash_time,level,pid,wait_event,path
        ) as all_wait_chain
        where level=max_level
) as wait_chain
group by path,cnt
order by count(*) desc;

Let’s launch this query while only one session is being blocked by another one:

postgres@pgu:~$ psql -f pg_ash_wait_chain.sql
 % of total wait | seconds |                 wait_chain
-----------------+---------+--------------------------------------------
 100%            |      23 | pid:1890 (Lock : transactionid) ->pid:1888
(1 row)

It means that the pid 1890 is waiting since 23 seconds on the transactionid wait event, while being blocked by pid 1888. This wait chain represents 100% of the blocking activity time.

Now another session comes into the game, query the active session history view one more time:

postgres@pgu:~$ psql -f pg_ash_wait_chain.sql
 % of total wait | seconds |                                  wait_chain
-----------------+---------+------------------------------------------------------------------------------
 88%             |     208 | pid:1890 (Lock : transactionid) ->pid:1888
 12%             |      29 | pid:1913 (Lock : transactionid) ->pid:1890 (Lock : transactionid) ->pid:1888
(2 rows)

So we still see our first blocking chain. It is now not the only one (so represents 88% of the blocking activity time).

We can see a new chain that represents 12% of the blocking activity time:

  • pid 1913 (waiting on transactionid) is blocked since 29 seconds by pid 1890 (waiting on transactionid) that is also blocked by pid 1888.

Let’s commit the transaction hold by pid 1888 and launch the query again 2 times:

postgres@pgu:~$ psql -f pg_ash_wait_chain.sql
 % of total wait | seconds |                                  wait_chain
-----------------+---------+------------------------------------------------------------------------------
 57%             |     582 | pid:1890 (Lock : transactionid) ->pid:1888
 40%             |     403 | pid:1913 (Lock : transactionid) ->pid:1890 (Lock : transactionid) ->pid:1888
 3%              |      32 | pid:1913 (Lock : transactionid) ->pid:1890
(3 rows)

postgres@pgu:~$ psql -f pg_ash_wait_chain.sql
 % of total wait | seconds |                                  wait_chain
-----------------+---------+------------------------------------------------------------------------------
 57%             |     582 | pid:1890 (Lock : transactionid) ->pid:1888
 40%             |     403 | pid:1913 (Lock : transactionid) ->pid:1890 (Lock : transactionid) ->pid:1888
 3%              |      33 | pid:1913 (Lock : transactionid) ->pid:1890
(3 rows)

As you can see the first two chains are still displayed (as the query does not filter on ash_time) but are not waiting anymore (seconds does not increase) while the last one (new one) is still waiting (seconds increase).

Remarks

Conclusion

We have seen how the blocking information part of the pg_active_session_history view could help to drill down in case of blocking activity.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.