Caused by: java.io.IOException: Lease timeout of 0 seconds expired.

xiaoxiao2021-02-27  125

2017-07-09 10:33:07.040 [pool-2-thread-9] ERROR com.bonree.browser.util.GenerateParquet - can not write PageHeader(type:DICTIONARY_PAGE, uncompressed_page_size:34, compressed_page_size:34, dictionary_page_header:DictionaryPageHeader(num_values:2, encoding:PLAIN_DICTIONARY)) java.io.IOException: can not write PageHeader(type:DICTIONARY_PAGE, uncompressed_page_size:34, compressed_page_size:34, dictionary_page_header:DictionaryPageHeader(num_values:2, encoding:PLAIN_DICTIONARY)) at org.apache.parquet.format.Util.write(Util.java:224) ~[parquet-format-2.3.0-incubating.jar:2.3.0-incubating] at org.apache.parquet.format.Util.writePageHeader(Util.java:61) ~[parquet-format-2.3.0-incubating.jar:2.3.0-incubating] at org.apache.parquet.format.converter.ParquetMetadataConverter.writeDictionaryPageHeader(ParquetMetadataConverter.java:732) ~[parquet-hadoop-1.7.0.jar:1.7.0] at org.apache.parquet.hadoop.ParquetFileWriter.writeDictionaryPage(ParquetFileWriter.java:238) ~[parquet-hadoop-1.7.0.jar:1.7.0] at org.apache.parquet.hadoop.ColumnChunkPageWriteStore ColumnChunkPageWriter.writeToFileWriter(ColumnChunkPageWriteStore.java:179) [parquethadoop1.7.0.jar:1.7.0]atorg.apache.parquet.hadoop.ColumnChunkPageWriteStore.flushToFileWriter(ColumnChunkPageWriteStore.java:238) [parquethadoop1.7.0.jar:1.7.0]atorg.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:155) [parquethadoop1.7.0.jar:1.7.0]atorg.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113) [parquethadoop1.7.0.jar:1.7.0]atorg.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:267) [parquethadoop1.7.0.jar:1.7.0]atcom.bonree.browser.util.GenerateParquet.closeAjaxWriter(GenerateParquet.java:334) [classes/:na]atcom.bonree.browser.business.ConsumeFileBusiness.close(ConsumeFileBusiness.java:280)[classes/:na]atcom.bonree.browser.business.ConsumeFileBusiness.run(ConsumeFileBusiness.java:148)[classes/:na]atjava.util.concurrent.Executors RunnableAdapter.call(Executors.java:471) [na:1.7.0_79] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_79] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_79] at java.util.concurrent.ThreadPoolExecutor Worker.run(ThreadPoolExecutor.java:615)[na:1.7.079]atjava.lang.Thread.run(Thread.java:745)[na:1.7.079]Causedby:parquet.org.apache.thrift.transport.TTransportException:java.io.IOException:Leasetimeoutof0secondsexpired.atparquet.org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147) [parquetformat2.3.0incubating.jar:2.3.0incubating]atparquet.org.apache.thrift.transport.TTransport.write(TTransport.java:105) [parquetformat2.3.0incubating.jar:2.3.0incubating]atparquet.org.apache.thrift.protocol.TCompactProtocol.writeByteDirect(TCompactProtocol.java:424) [parquetformat2.3.0incubating.jar:2.3.0incubating]atparquet.org.apache.thrift.protocol.TCompactProtocol.writeByteDirect(TCompactProtocol.java:431) [parquetformat2.3.0incubating.jar:2.3.0incubating]atparquet.org.apache.thrift.protocol.TCompactProtocol.writeFieldBeginInternal(TCompactProtocol.java:194) [parquetformat2.3.0incubating.jar:2.3.0incubating]atparquet.org.apache.thrift.protocol.TCompactProtocol.writeFieldBegin(TCompactProtocol.java:176) [parquetformat2.3.0incubating.jar:2.3.0incubating]atorg.apache.parquet.format.InterningProtocol.writeFieldBegin(InterningProtocol.java:74) [parquetformat2.3.0incubating.jar:2.3.0incubating]atorg.apache.parquet.format.PageHeader.write(PageHeader.java:918) [parquetformat2.3.0incubating.jar:2.3.0incubating]atorg.apache.parquet.format.Util.write(Util.java:222) [parquetformat2.3.0incubating.jar:2.3.0incubating]16commonframesomittedCausedby:java.io.IOException:Leasetimeoutof0secondsexpired.atorg.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:2063) [hadoophdfs2.5.2.jar:na]atorg.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:871) [hadoophdfs2.5.2.jar:na]atorg.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:825) [hadoophdfs2.5.2.jar:na]atorg.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:417) [hadoophdfs2.5.2.jar:na]atorg.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:442) [hadoophdfs2.5.2.jar:na]atorg.apache.hadoop.hdfs.LeaseRenewer.access 700(LeaseRenewer.java:71) ~[hadoop-hdfs-2.5.2.jar:na] at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298) ~[hadoop-hdfs-2.5.2.jar:na] … 1 common frames omitted 字面理解为文件操作超租期,实际上就是data stream操作过程中文件被删掉了。之前也遇到过,通常是因为Mapred多个task操作同一个文件,一个task完成后删掉文件导致。 不过这次在hdfs上传文件时发生了这个异常,导致上传失败。google了一把,有人反馈跟dfs.datanode.max.xcievers参数到达上限有关。这个是datanode处理请求的任务 上限,默认为256,集群上面配置为2048.于是去所有datanode上面扫了一下log,发现果然出现了IOE: java.io.IOException: xceiverCount 2049 exceeds the limit of concurrent xcievers 2048 翻源码找了一下xcievers,有DataXcievers和DataXcieversServer两个类,DataXcievers是DataXcieversServer启动的一个线程,用于处理输入输出数据流,其run() 方法有如下判断: 1 public void run() { 2 … 56 int curXceiverCount = datanode.getXceiverCount(); 57 if (curXceiverCount > dataXceiverServer.maxXceiverCount) { 58 throw new IOException(“xceiverCount ” + curXceiverCount 59 + ” exceeds the limit of concurrent xcievers ” 60 + dataXceiverServer.maxXceiverCount); 61 } 堆栈信息: xcievers超过限制抛了一个IOException,这反应到DFSClient端,就是正在操作的文件失去了响应,于是就出现了上面的租约超期异常。 解决方案: 继续改大 xceiverCount 至8192并重启集群生效。 dfs.datanode.max.xcievers: dfs.datanode.max.xcievers 256 dfs.datanode.max.xcievers 对于datanode来说,就如同linux上的文件句柄的限制,当datanode 上面的连接数操作配置中的设置时,datanode就会拒绝连接。 一般都会将此参数调的很大,40000+左右。

hadoop 日志错误 java.io.IOException: Premature EOF from inputStream 2015-03-17 11:24:25,467 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: d135.hadoop:50010:DataXceivererror processing WRITE_BLOCK operation src: /192.168.1.118:64599 dst: /192.168.1.135:50010 java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.HashMap.createEntry(HashMap.java:897) at java.util.HashMap.addEntry(HashMap.java:884) at java.util.HashMap.put(HashMap.java:505) at java.util.HashSet.add(HashSet.java:217)

2015-03-17 10:06:30,822 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: d135.hadoop:50010:DataXceivererror processing WRITE_BLOCK operation src: /192.168.1.135:36440 dst: /192.168.1.135:50010 java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:446) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:702) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:739) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:745)

问题原因 文件操作超租期,实际上就是data stream操作过程中文件被删掉了。通常是因为Mapred多个task操作同一个文件,一个task完成后删掉文件导致。这个错误跟dfs.datanode.max.transfer.threads参数到达上限有关。这个是datanode同时处理请求的任务上限,总默认值是 4096,该参数取值范围[1 to 8192]

解决办法: 修改每个datanode节点的hadoop配置文件hdfs-site.xml: 增加dfs.datanode.max.transfer.threads属性的设置,设置为8192 [html] view plain copy dfs.datanode.max.transfer.threads 8192

(dfs.datanode.max.xcievers和dfs.datanode.max.transfer.threads都是指的一个参数不过前者是hdp1.0的叫法) dfs.datanode.max.xcievers设置得过低有什么后果? 首先概括的说这个参数是表示datanode上负责进行文件操作的线程数。如果需要处理的文件过多,而这个参数设置得过低就会有一部分文件 处理不过来,就会报下面这个异常:

ERROR org.apache.hadoop.dfs.DataNode: DatanodeRegistration(10.10.10.53:50010,storageID=DS-1570581820-10.10.10.53-50010-1224117842339,infoPort=50075, ipcPort=50020):DataXceiver: java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256

我们都知道linux系统中所有的文件操作都被绑定到一个socket上,那么形象点解释,操作文件就是对这个socket操作,进一步具体可以把他看做是一个线程。而这个参数就是指定这种线程的个数。 比如:

public DFSInputStream open(String src) throws IOException public FSDataOutputStream create(Path f) throws IOException

在dfs中的这种open(),create()的操作对应到真实的动作就是在服务端(dn)new一个线程(socket)来处理。 下面来看看这个线程机制是怎么运转的: 在datanode里面有一个专门的线程组来维护这些线程,同时有一个守护线程来监视这个线程组的体量——DataXceiverServer 它负责监测线程数量是否到达上线,超过就抛出异常:

//在datanode启动时,创建这个线程组 this.threadGroup = new ThreadGroup(“dataXceiverServer”); this.dataXceiverServer = new Daemon(threadGroup, new DataXceiverServer(ss, conf, this)); this.threadGroup.setDaemon(true); // auto destroy when empty

//DataXceiverServer随时监控及报警 /* Number of concurrent xceivers per node. / int getXceiverCount() { return threadGroup == null ? 0 : threadGroup.activeCount(); }

if (curXceiverCount > dataXceiverServer.maxXceiverCount) { throw new IOException(“xceiverCount ” + curXceiverCount + ” exceeds the limit of concurrent xcievers “ + dataXceiverServer.maxXceiverCount); }

为什么会有这样的机制呢? 这是因为如果这样的线程过多,系统内存就会暴掉(一个线程约占1M内存,一台datanode以60G内存计,则最多允许有6万个线程,而这只是理想状态的) 一些估算dfs.datanode.max.xcievers值得方法: 最常规的算法:

主要用作Hbase的服务器: (Reserve(20%)是指多分配20%的空间以允许文件数等的增长)
转载请注明原文地址: https://www.6miu.com/read-15014.html

最新回复(0)