场景:旧集群的数据要迁移到新集群上面
hadoop distcp [option] hdfs://master_ip:8020/hive/warehouse/xxx.db/tab_name hdfs://master_ip:8020/hive/warehouse/xxx.db/tab_name
option的内容可以hadoop distcp回车就可以查看帮助了,这里不用多解释了吧。
master_ip:填集群master的IP
tab_name:天要迁移表的名字
路径要保证正确,如果你不知道表的路径可以用desc formatted db_name.tab_name来看。location就是正确的路径,把test01换成master_ip:port即可。
例如:
hive> desc formatted aidemo.ac_ref; OK # col_name data_type comment pkg_name string label string # Detailed Table Information Database: aidemo Owner: hchou CreateTime: Wed Jun 07 15:34:35 CST 2017 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://test01/hive/warehouse/aidemo.db/ac_ref Table Type: MANAGED_TABLE Table Parameters: transient_lastDdlTime 1496820875 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: field.delim \t serialization.format \t Time taken: 0.078 seconds, Fetched: 28 row(s)