Count data content changes
There are six functions that return statistics about recorded data content changes:
emaj_log_stat_group() and emaj_log_stat_groups() quickly deliver, for each table from one or several tables groups, the number of changes that have been recorded in the related log tables, either between 2 marks or since a given mark,
emaj_detailed_log_stat_group() and emaj_detailed_log_stat_groups() provide more detailed information than emaj_log_stat_group(), the number of updates been reported per table, SQL verb type and connection role,
emaj_sequence_stat_group() and emaj_sequence_stat_groups() return statistics about how sequences from one or several tables groups change between two marks or since a given mark.
These functions can be used by emaj_adm and emaj_viewer E-Maj roles.
Global statistics about log tables contents
Full global statistics about logs content are available with this SQL statement:
SELECT * FROM emaj.emaj_log_stat_group('<group.name>', '<start.mark>', '<end.mark>');
The function returns a set of rows, whose type is named emaj.emaj_log_stat_type, and contains the following columns:
Column |
Type |
Description |
---|---|---|
stat_group |
TEXT |
tables group name |
stat_schema |
TEXT |
schema name |
stat_table |
TEXT |
table name |
stat_first_mark |
TEXT |
mark name of the period start |
stat_first_mark_datetime |
TIMESTAMPTZ |
mark timestamp of the period start |
stat_last_mark |
TEXT |
mark name of the period end |
stat_last_mark_datetime |
TIMESTAMPTZ |
mark timestamp of the period end |
stat_rows |
BIGINT |
number of updates recorded into the related log table |
A NULL value supplied as end mark represents the current state.
The keyword ‘EMAJ_LAST_MARK’ can be used as mark name. It then represents the last set mark.
The function returns one row per table, even if there is no logged update for this table. In this case, stat_rows columns value is 0.
Most of the time, the stat_first_mark, stat_first_mark_datetime, stat_last_mark and stat_last_mark_datetime columns reference the start and end marks of the requested period. But they can contain other values when a table has been added or removed from the tables group during the requested time interval.
It is possible to easily execute more precise requests on these statistics. For instance, it is possible to get the number of database updates by application schema, with a statement like:
postgres=# SELECT stat_schema, sum(stat_rows)
FROM emaj.emaj_log_stat_group('myAppl1', NULL, NULL)
GROUP BY stat_schema;
stat_schema | sum
-------------+-----
myschema | 41
(1 row)
There is no need for log table scans to get these statistics. For this reason, they are delivered quickly.
But returned values may be approximative (in fact over-estimated). This occurs in particular when transactions executed between both requested marks have performed table updates before being cancelled.
Using the emaj_log_stat_groups() function, log statistics can be obtained for several groups at once:
SELECT * FROM emaj.emaj_log_stat_groups('<group.names.array>', '<start.mark>', '<end.mark>');
More information about multi-groups functions.
Detailed statistics about logs
Scanning log tables brings a more detailed information, at a higher response time cost. So can we get fully detailed statistics with the following SQL statement:
SELECT * FROM emaj.emaj_detailed_log_stat_group('<group.name>', '<start.mark>', '<end.mark>');
The function returns a set of rows, whose type is named emaj.emaj_detailed_log_stat_type, and contains the following columns:
Column |
Type |
Description |
---|---|---|
stat_group |
TEXT |
tables group name |
stat_schema |
TEXT |
schema name |
stat_table |
TEXT |
table name |
stat_first_mark |
TEXT |
mark name of the period start |
stat_first_mark_datetime |
TIMESTAMPTZ |
mark timestamp of the period start |
stat_last_mark |
TEXT |
mark name of the period end |
stat_last_mark_datetime |
TIMESTAMPTZ |
mark timestamp of the period end |
stat_role |
TEXT |
connection role |
stat_verb |
TEXT |
type of the SQL verb that has performed the update, with values: INSERT / UPDATE / DELETE) |
stat_rows |
BIGINT |
number of updates recorded into the related log table |
A NULL value supplied as end mark represents the current state.
The keyword ‘EMAJ_LAST_MARK’ can be used as mark name. It then represents the last set mark.
Unlike emaj_log_stat_group(), the emaj_detailed_log_stat_group() function doesn’t return any rows for tables having no logged updates inside the requested marks range. So stat_rows column never contains 0.
Most of the time, the stat_first_mark, stat_first_mark_datetime, stat_last_mark and stat_last_mark_datetime columns reference the start and end marks of the requested period. But they can contain other values when a table has been added or removed from the tables group during the requested time interval.
Using the emaj_detailed_log_stat_groups() function, detailed log statistics can be obtained for several groups at once:
SELECT * FROM emaj.emaj_detailed_log_stat_groups('<group.names.array>', '<start.mark>', '<end.mark>');
More information about multi-groups functions.
Statistics about sequence changes
Global statistics about how sequences change are available with this SQL statement:
SELECT * FROM emaj.emaj_sequence_stat_group('<group.name>', '<start.mark>', '<end.mark>');
The function returns a set of rows, whose type is named emaj.emaj_sequence_stat_type, and contains the following columns:
Column |
Type |
Description |
---|---|---|
stat_group |
TEXT |
tables group name |
stat_schema |
TEXT |
schema name |
stat_sequence |
TEXT |
sequence name |
stat_first_mark |
TEXT |
mark name of the period start |
stat_first_mark_datetime |
TIMESTAMPTZ |
mark timestamp of the period start |
stat_last_mark |
TEXT |
mark name of the period end |
stat_last_mark_datetime |
TIMESTAMPTZ |
mark timestamp of the period end |
stat_increments |
BIGINT |
number of increments separating both sequence value at the period beginning and end |
stat_has_structure_changed |
BOOLEAN |
flag indicating whether any property of this sequence has changed during the period |
A NULL value supplied as end mark represents the current state.
The keyword ‘EMAJ_LAST_MARK’ can be used as mark name. It then represents the last set mark.
The function returns one row per sequence, even if no change has been detected during the period.
Most of the time, the stat_first_mark, stat_first_mark_datetime, stat_last_mark and stat_last_mark_datetime columns reference the start and end marks of the requested period. But they can contain other values when a sequence has been added or removed from the tables group during the requested time interval.
Sequence statistics are delivered quickly. Needed data are only stored into the small internal table that records the sequences state when marks are set.
But returned values may be approximative. Indeed, there is no way to detect temporary property changes during the period. Similarly, regarding the number of increments, there is no way to detect:
setval() function calls (used by E-Maj rollbacks for instance),
a return to the sequence minimum value (MINVALUE) if the sequence is cyclic (CYCLE) and the maximum value (MAXVALUE) has been reached,
an increment change during the period.
For a given sequence, the number of increments is computed as the difference between the LAST_VALUE at the period end and the LAST_VALUE at the period beginning, divided by the INCREMENT value at the period beginning. As a consequence, it is possible to get negative numbers of increments.
Using the emaj_sequence_stat_groups() function, log statistics can be obtained for several groups at once:
SELECT * FROM emaj.emaj_sequence_stat_groups('<group.names.array>', '<start.mark>', '<end.mark>');
More information about multi-groups functions.