pgxl 系列之六: ubuntu 16.04 的业务数据模拟

xiaoxiao2025-07-31  22

os: ubuntu 16.04 postgres-xl: 10alpha3

ip 规划:

gtm 192.168.56.99 gtm gtms 192.168.56.100 gtm slave

node1 192.168.56.101 gtm-proxy,coordinator,datanode node2 192.168.56.102 gtm-proxy,coordinator,datanode node3 192.168.56.103 gtm-proxy,coordinator,datanode node4 192.168.56.104 gtm-proxy,coordinator,datanode

node1、node2 互为主从,node3、node4 互为主从 端口使用情况

gtm 20001 gtm slave 20001 gtm proxy 20001 gtm proxy 20001 coordinator 20004 coordinator 20005 coordinator slave 20005 coordinator slave 20004 coordinator pool 20010 coordinator pool 20011 20011 20010 datanode 20008 datanode 20009 datanode slave 20009 datanode slave 20008 datanode pool 20012 datanode pool 20013 20013 20012

模拟业务数据,做一些常规的增删改查操作。

连接 pgxl coordinator

需要修改 node1、node2、node3、node4节点所有实例的pg_hba.conf

$ vi pg_hba.conf host all all 192.168.0.101/32 trust host all all 192.168.0.102/32 trust host all all 192.168.0.103/32 trust host all all 192.168.0.104/32 trust host all all 192.168.0.0/24 md5

app只能连接 coordinator,不能直连 datanode 节点,架构图参考 https://www.postgres-xl.org/overview/ 在 node1 节点使用postgres登录 coordinator,创建app连接的用户。

$ psql -p 20004 psql (PGXL 10alpha2, based on PG 10beta3 (Postgres-XL 10alpha2)) Type "help" for help. postgres=# create user peiyb with password 'rootroot'; CREATE ROLE postgres=# create database peiybdb with owner=peiyb; CREATE DATABASE postgres=# postgres=# peiybdb=> \du List of roles Role name | Attributes | Member of -----------+------------------------------------------------------------+----------- peiyb | | {} postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {} peiybdb=> \l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+----------+----------+-------------+-------------+----------------------- peiybdb | peiyb | UTF8 | en_US.UTF-8 | en_US.UTF-8 | postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres + | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres + | | | | | postgres=CTc/postgres (4 rows) postgres=# \q

在 node1 节点使用peiyb登录 coordinator

$ psql -p 20004 -U peiyb peiybdb psql (PGXL 10alpha2, based on PG 10beta3 (Postgres-XL 10alpha2)) Type "help" for help. peiybdb=> peiybdb=>

创建机构表、销量表

peiybdb=> create table dm_org(id varchar(100),name varchar(100)); CREATE TABLE peiybdb=> create table fact_sales(day_id varchar(10),time_id varchar(10),org_id varchar(100),product_id varchar(100),qty int8,amount numeric(10,2)); CREATE TABLE peiybdb=> \d List of relations Schema | Name | Type | Owner --------+------------+-------+------- public | dm_org | table | peiyb public | fact_sales | table | peiyb (2 rows) peiybdb=> \d+ dm_org Table "public.dm_org" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+------------------------+-----------+----------+---------+----------+--------------+------------- id | character varying(100) | | | | extended | | name | character varying(100) | | | | extended | | Distribute By: HASH(id) Location Nodes: ALL DATANODES peiybdb=> \d+ fact_sales Table "public.fact_sales" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ------------+------------------------+-----------+----------+---------+----------+--------------+------------- day_id | character varying(10) | | | | extended | | time_id | character varying(10) | | | | extended | | org_id | character varying(100) | | | | extended | | product_id | character varying(100) | | | | extended | | qty | bigint | | | | plain | | amount | numeric(10,2) | | | | main | | Distribute By: HASH(day_id) Location Nodes: ALL DATANODES

可以看到,多了Distribute By 和 Location Nodes两行。

peiybdb=> \h create table [ DISTRIBUTE BY { REPLICATION | ROUNDROBIN | { [HASH | MODULO ] ( column_name ) } } | DISTRIBUTED { { BY ( column_name ) } | { RANDOMLY } | DISTSTYLE { EVEN | KEY | ALL } DISTKEY ( column_name ) ] [ TO { GROUP groupname | NODE ( nodename [, ... ] ) } ]

如果DISTRIBUTE BY 后面有如下选项: REPLICATION,则是复制模式,其余则是分片模式 ROUNDROBIN 指的是按照轮询的方式分布数据 HASH 指的是按照指定列的哈希值分布数据 MODULO 指的是按照指定列的取摩运算分布数据

TO GROUP|NODE 指定了数据分布的节点范围,如果没有指定则默认所有数据节点参与数据分布。 如果没有指定分布模式,即使用普通的CREATE TABLE语句,PGXL会默认采用分片模式将数据分布到所有数据节点。

初始化机构表、销量表数据

peiybdb=> select * from dm_org; id | name -------+--------- 10001 | A经营部 10003 | C经营部 10004 | D经营部 10002 | B经营部 10005 | E经营部 10006 | F经营部 (6 rows) peiybdb=> insert into dm_org(id,name) values('10001','A经营部'),('10002','B经营部'),('10003','C经营部'),('10004','D经营部'),('10005','E经营部'),('10006','F经营部'); peiybdb=> insert into fact_sales(day_id,time_id,org_id,product_id,qty,amount) select t_day.day_id, '' as time_id, t_org.id, ((random()*100000)::int8)::varchar as product_id, t_sale.qty, t_sale.amount from (select to_char(generate_series('2018-10-01'::date,'2018-10-25','1 day'),'yyyymmdd') as day_id) t_day, (SELECT (random()*10000)::int8 as qty, (random()*10000)::numeric(10,2) as amount FROM generate_series(1,10000) ) as t_sale, (select id,name from dm_org) t_org where 1=1 ;

查询一:B经营部 在 20181010 的总的销售量和销售额。

peiybdb=> explain select sale.day_id, org.name, sum(qty), sum(amount) from dm_org org, fact_sales sale where 1=1 and org.name='B经营部' and org.id=sale.org_id and sale.day_id='20181010' group by sale.day_id, org.name QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Finalize GroupAggregate (cost=100.00..123.95 rows=1 width=320) Group Key: sale.day_id, org.name -> Remote Subquery Scan on all (node1,node2,node3,node4) (cost=100.00..123.92 rows=1 width=320) -> Partial GroupAggregate (cost=0.00..23.91 rows=1 width=320) Group Key: sale.day_id, org.name -> Nested Loop (cost=0.00..23.89 rows=1 width=280) Join Filter: ((org.id)::text = (sale.org_id)::text) -> Remote Subquery Scan on all (node1,node2,node3,node4) (cost=100.00..112.57 rows=1 width=436) Distribute results by H: id -> Seq Scan on dm_org org (cost=0.00..12.12 rows=1 width=436) Filter: ((name)::text = 'B经营部'::text) -> Materialize (cost=100.00..112.04 rows=1 width=280) -> Remote Subquery Scan on all (node1,node2,node3,node4) (cost=100.00..112.03 rows=1 width=280) Distribute results by H: org_id -> Seq Scan on fact_sales sale (cost=0.00..11.75 rows=1 width=280) peiybdb=> select sale.day_id, org.name, sum(qty), sum(amount) from dm_org org, fact_sales sale where 1=1 and org.name='B经营部' and org.id=sale.org_id and sale.day_id='20181010' group by sale.day_id, org.name day_id | name | sum | sum ----------+---------+----------+------------- 20181010 | B经营部 | 49843700 | 49806627.37 (1 row)

执行计划中出现的 Remote Subquery Scan on all (node1,node2,node3,node4),表明在sql在所有节点都执行了。

登录任何一个 coordinator ,都视为登录同一台 postgresql database cluster。

另外: 维度表一般都是小表,可以在每个节点均保存完整数据,在查询时效率更高。 比如 create table dm_org1(id varchar(100),name varchar(100)) DISTRIBUTE BY REPLICATION ;

查询一 中使用 dm_org、dm_org1对比,前者需要 Time: 3416.255 ms (00:03.416),后者只需要 Time: 485.845 ms,差异还是比较明显。

事实表从一开始是就应该设计成分片,确保在每个节点的数据量尽可能相同。

参考: https://www.postgres-xl.org/documentation/ https://www.postgres-xl.org/documentation/sql-createtable.html

转载请注明原文地址: https://www.6miu.com/read-5034039.html

最新回复(0)