PostgreSql 系统表 pg

xiaoxiao2021-02-28 86

pg_stats是由pg_statistic系统表扩展而来的系统视图，记录的是每个表每个字段的统计信息，用于PostgreSQL优化器做执行计划选择的时候提供参考。

名字类型引用描述schemanamenamepg_namespace.nspname包含此表的模式名字tablenamenamepg_class.relname表的名字attnamenamepg_attribute.attname这一行描述的字段的名字inheritedbool 如果为真，那么这行包含继承的子字段，不只是指定表的值。null_fracreal 记录中字段为空的百分比avg_widthinteger 字段记录以字节记的平均宽度n_distinctreal 如果大于零，就是在字段中独立数值的估计数目。如果小于零，就是独立数值的数目被行数除的负数。用负数形式是因为ANALYZE 认为独立数值的数目是随着表增长而增长；正数的形式用于在字段看上去好像有固定的可能值数目的情况下。比如， -1 表示一个唯一字段，独立数值的个数和行数相同。most_common_valsanyarray 一个字段里最常用数值的列表。如果看上去没有啥数值比其它更常见，则为 nullmost_common_freqsreal[] 一个最常用数值的频率的列表，也就是说，每个出现的次数除以行数。如果most_common_vals是 null ，则为 null。histogram_boundsanyarray 一个数值的列表，它把字段的数值分成几组大致相同热门的组。如果在most_common_vals里有数值，则在这个饼图的计算中省略。如果字段数据类型没有<操作符或者most_common_vals 列表代表了整个分布性，则这个字段为 null。correlationreal 统计与字段值的物理行序和逻辑行序有关。它的范围从 -1 到 +1 。在数值接近 -1 或者 +1 的时候，在字段上的索引扫描将被认为比它接近零的时候开销更少，因为减少了对磁盘的随机访问。如果字段数据类型没有<操作符，那么这个字段为null。most_common_elemsanyarray 经常在字段值中出现的非空元素值的列表。（标量类型为空。）most_common_elem_freqsreal[] 最常见元素值的频率列表，也就是，至少包含一个给定值的实例的行的分数。每个元素频率跟着两到三个附加的值；它们是在每个元素频率之前的最小和最大值，还有可选择的null元素的频率。（当most_common_elems 为null时，为null）elem_count_histogramreal[] 该字段中值的不同非空元素值的统计直方图，跟着不同非空元素的平均值。（标量类型为空。）

下面通过实例来解释表中参数的意思：

1. 建表 create table tb13(id integer,name character varying,age integer); 2. 插入数据 postgres=# insert into tb13 select generate_series(1,10000),'john',(random()*1000)::integer; INSERT 0 10000 postgres=# insert into tb13 select generate_series(10001,10100),'tom',(random()*1000)::integer; INSERT 0 100 为了方便讲解，将id字段的statistics target值设置为10： postgres=# ALTER TABLE tb13 ALTER COLUMN id set STATISTICS 10; ALTER TABLE 3. 分析表 postgres=# analyze tb13; ANALYZE 4. 取id字段的统计信息看看 postgres=# select * from pg_stats where tablename='tb13' and attname='id'; -[ RECORD 1 ]----------+------------------------------------------------------- schemaname | public （表所在的schema） tablename | tb13 （表名） attname | id （字段名） inherited | f （是否是继承而来的字段，t：是；f：否） null_frac | 0 （null值的百分比，这里为0%） avg_width | 4 （该字段的平均长度） n_distinct | -1 （表示该字段的唯一值的个数，-1：表示该字段有唯一约束，大于0的整数，比如m：表示该字段有m个唯一值） most_common_vals | （高频值，这里没有，因为是主键） most_common_freqs | （高频值的出现的频率） histogram_bounds | {1,1010,2020,3030,4040,5050,6060,7070,8080,9090,10100} (该字段除高频值以外值的的柱状图信息) correlation | 1 （表中记录的逻辑顺序与存储的物理顺序的关系，-1到1之间，1表示逻辑顺序与存储的物理顺序相同，-1表示逻辑顺序与存储的物理顺序相反） most_common_elems | （该字段是数组元素的统计信息，高频元素） most_common_elem_freqs | （该字段是数组元素的统计信息，高频元素出现的频率） elem_count_histogram | （该字段是数组元素的统计信息，该列元素唯一值个数平均分布柱状图）

解释histogram_bounds={1,1010,2020,3030,4040,5050,6060,7070,8080,9090,10100}表示的意思：

　　如图所示，x轴表示id值，y轴表示的是记录数，表示PostgreSQL统计分析后的结果是，id字段值为1~1010的记录数有1009（1010-1）条，1010~2020的记录数有1010条，…. … ，展示的是id字段的值的分布区间图。　　如果statistics target越大，取得的区间数就越多，统计就越精确。

postgres=# alter table tb13 alter column id set statistics 100; ALTER TABLE postgres=# postgres=# analyze tb13; ANALYZE postgres=# select * from pg_stats where tablename='tb13' and attname='id'; -[ RECORD 1 ]----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- schemaname | public tablename | tb13 attname | id inherited | f null_frac | 0 avg_width | 4 n_distinct | -1 most_common_vals | most_common_freqs | histogram_bounds | {1,101,202,303,404,505,606,707,808,909,1010,1111,1212,1313,1414,1515,1616,1717,1818,1919,2020,2121,2222,2323,2424,2525,2626,2727,2828,2929,3030,3131,3232,3333,3434,3535,3636,3737,3838,3939,4040,4141,4242,4343,4444,4545,4646,4747,4848,4949,5050,5151,5252,5353,5454,5555,5656,5757,5858,5959,6060,6161,6262,6363,6464,6565,6666,6767,6868,6969,7070,7171,7272,7373,7474,7575,7676,7777,7878,7979,8080,8181,8282,8383,8484,8585,8686,8787,8888,8989,9090,9191,9292,9393,9494,9595,9696,9797,9898,9999,10100} correlation | 1 most_common_elems | most_common_elem_freqs | elem_count_histogram |

把statistics的值调成100，统计结果就有100个区间数。

5. 取name字段的统计信息看看： postgres=# select * from pg_stats where tablename='tb13' and attname='name'; -[ RECORD 1 ]----------+---------------------- schemaname | public tablename | tb13 attname | name inherited | f null_frac | 0 avg_width | 4 n_distinct | 2 （这里是2，表示该字段有2个唯一值） most_common_vals | {john,tom} （高频值是‘john’和‘tom’） most_common_freqs | {0.990099,0.00990099} （‘john’对应的频率是0.990099，‘tom’对应的频率是0.00990099） histogram_bounds | （该字段除了高频值就没有其他值了，所有没有该字段的柱状图） correlation | 1（表示逻辑顺序与存储的物理顺序相同） most_common_elems | most_common_elem_freqs | elem_count_histogram | 6. correlation字段的说明 postgres=# create table tb15(id integer); CREATE TABLE postgres=# insert into tb15 select generate_series(1,5); INSERT 0 5 postgres=# select * from tb15; id ---- 1 2 3 4 5 (5 rows) postgres=# analyze tb15; ANALYZE postgres=# select tablename,attname,correlation from pg_stats where tablename ='tb15' and attname='id'; tablename | attname | correlation -----------+---------+------------- tb15 | id | 1 (1 row) postgres=# truncate tb15; TRUNCATE TABLE postgres=# insert into tb15 select generate_series(5,1,-1); INSERT 0 5 postgres=# select * from tb15; id ---- 5 4 3 2 1 (5 rows) postgres=# analyze tb15; ANALYZE postgres=# select tablename,attname,correlation from pg_stats where tablename ='tb15' and attname='id'; tablename | attname | correlation -----------+---------+------------- tb15 | id | -1 (1 row) 7. 数组字段的统计分析 postgres=# create table tb16 (id integer,info int[]); CREATE TABLE postgres=# postgres=# insert into tb16 select generate_series(1,10000),'{1,1,2,2,3,3,4,4}'::int[]; INSERT 0 10000 postgres=# insert into tb16 select generate_series(1,100),'{100,200}'::int[]; INSERT 0 100 postgres=# postgres=# analyze tb16; ANALYZE postgres=# postgres=# select * from pg_stats where tablename ='tb16' and attname='info'; -[ RECORD 1 ]----------+--------------------------------------------------- schemaname | public tablename | tb16 attname | info inherited | f null_frac | 0 avg_width | 52 n_distinct | 2 most_common_vals | {"{1,1,2,2,3,3,4,4}","{100,200}"}(高频字段值) most_common_freqs | {0.990099,0.00990099}（高频字段值对应的频率） histogram_bounds | correlation | 1 most_common_elems | {1,2,3,4,100,200}（高频元素值） most_common_elem_freqs | {0.990099,0.990099,0.990099,0.990099,0.00990099,0.00990099,0.00990099,0.990099,0}（高频元素值对应的频率） elem_count_histogram | {2,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,3.9802}（该列元素唯一值个数平均分布柱状图, 末尾为平均唯一值个数. 本例说明大部分为4个元素（就是｛1，1，2，2，3，3，4，4｝这个数组））

8. most_common_vals 与most_common_elems 的区别：

most_common_vals 是字段值为单位的高频值。most_common_elems ，当字段类型是数组类型的时候会出现，表示元素的高频值，数组里面的元素。

转载请注明原文地址: https://www.6miu.com/read-70470.html

技术

最新回复(0)