Ambari离线部署Hadoop集群踩到的坑

xiaoxiao2021-02-27  355

1、远程拷贝HDP组件不全导致安装client时缺少rpm包,手动拷贝解决

2、安装HAWQ,启动时报错 passwordlell ssh hawq hosts ,hawq master 和其他主机机拷贝文件输入密码受限,两方面原因: 一 root 用户 ssh 无密登录时 权限配置错误,正确的权限应该是 chmod 700 /roo/.ssh chmod 600 /root/.ssh/authorized_keys ;二:su gpadmin 在 /home/gpadmin 下新建hawq_host文件,写入节点hostname 执行 hawq ssh-exkeys -f host_file 检查Log发现RSA hostname 无法访问, 修改/etc/hosts文件,重新修改hostname 成功。

3、中间安装过程失败卸载服务 

   卸载某个服务

stop:

curl -s -u admin:admin -H “X-Requested-By: Ambari” -X PUT -d ‘{“RequestInfo”:{“context”:”Stop Service”},”Body”:{“ServiceInfo”:{“state”:”INSTALLED”}}}’ http://AMBARI-HOST:8080/api/v1/clusters/CLUSTER_NAME/services/SERVICE_NAME

delete

curl -s -u admin:admin -H “X-Requested-By: Ambari” -X DELETE http://AMBARI-HOST:8080/api/v1/clusters/CLUSTER_NAME/services/SERVICE_NAME

卸载整个集群(Ambari和hadoop)

执行脚本:

#!/bin/bash ambari-server stop ambari-server reset ambari-agent stop service mysqld stop service postgresql stop python /usr/lib/python2.6/site-packages/ambari_agent/HostCleanup.py yum remove ambari\* hadoop hdfs bigtop-jsvc bigtop-tomcat hbase\* hadoop\* hdp-select ranger\* zookeeper\* postgresql-libs postgresql postgresql-server yum remove mysql mysql-server mysql-libs mysql-connector-java rm -rf /opt/hadoop rm -rf /opt/app/hadoop rm -rf /opt/app/ambari-metrics-collector rm -rf /opt/kafka-logs rm -rf /usr/hdp rm -rf /usr/hadoop rm -rf /usr/kafka-logs rm -rf /usr/lib/ambari* rm -rf /usr/lib/hadoop rm -rf /usr/lib/nagios rm -rf /usr/lib/ams-hbase rm -rf /var/nagios rm -rf /var/kafka-logs rm -rf /var/lib/ambari* rm -rf /var/lib/flume rm -rf /var/lib/ganglia* rm -rf /var/lib/hadoop* rm -rf /var/lib/hdfs rm -rf /var/lib/hive rm -rf /var/lib/atlas rm -rf /var/lib/mysql rm -rf /var/lib/pgsql rm -rf /var/run/hadoop /var/run/hbase /var/run/zookeeper /var/run/flume /var/run/webhcat /var/run/hadoop-yarn /var/run/hadoop-mapreduce rm -rf /var/run/accumulo rm -rf /var/run/ambari* rm -rf /var/run/atlas rm -rf /var/run/nagios rm -rf /var/run/spark rm -rf /var/log/hbase /var/log/hive /var/log/zookeeper /var/log/flume /var/log/hadoop-yarn /var/log/hadoop-mapreduce rm -rf /var/log/accumulo rm -rf /var/log/ambari* rm -rf /var/log/atlas rm -rf /var/log/nagios rm -rf /var/log/spark rm -rf /var/log/hadoop rm -rf /tmp/ambari-qa rm -rf /etc/ambari* rm -rf /etc/ams-hbase rm -rf /etc/flume rm -rf /etc/ganglia rm -rf /etc/hadoop* rm -rf /etc/hbase rm -rf /etc/hive* rm -rf /etc/nagios rm -rf /etc/phoenix rm -rf /etc/pig rm -rf /etc/tez rm -rf /etc/zookeeper rm -rf /etc/accumulo rm -rf /etc/atlas rm -rf /etc/spark rm -rf /etc/mahout rm -rf /home/accumulo /home/ams /home/atlas /home/mahout /home/nagios /home/spark rm -rf /etc/yum.repos.d/ambari.repo /etc/yum.repos.d/HDP-2.3.0.0.repo /etc/yum.repos.d/HDP-UTILS.repo /etc/yum.repos.d/HDP.repo yum clean all

ps -elf | grep java

另外补充: userdel 部分

4、卸载所以服务之后 yum 不能用,发现是卸载python的组件导致

执行 whereis python  修改 vi /usr/bin/yum 中python的目录

5、安装metrict-monitor client的过程中报错, require python-2.6.6-64 while installed python-2.6.6-66

   在已经挂载镜像iso的Packages中拷贝出对应的python python-devel python-lib 下载python-2.6.6-66 rpm -e --nodeps python 后重新安装python2.6.6-64 报错解决

6、ams服务无法停止 ,进程无法Kill,userdel 无法删除, 重启机器后即可。

7、datanode 和zookeeper启动后一会自动挂掉,查Log发现 报错 Address already in use 查看对应组件的Log  /var/log/.....查看对应的端口,通过 netstat -anp | grep port_name kill 掉对应的进程,重新启动服务成功。

8、hawq master无法启动 执行 sysctl -p 后正常启动

转载请注明原文地址: https://www.6miu.com/read-4615.html

最新回复(0)