tensorflow 批量读取csv文件用于做深度学习算法相关

xiaoxiao2021-02-28  89

目前用了tensorflow、deeplearning4j两个深度学习框架,dl相关算法对数据格式要求都是批量的喂进去,deepl4j在前面已经有几个例子说明,tensorflow也可以批量读取数据,不断给dl算法喂数据进去,在网上刚刚看到一个例子,http://www.cnblogs.com/hunttown/p/6844477.html ,首先数据格式如下,鸾尾花数据 做机器学习的人应该都知道:

Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species21,5.4,3.4,1.7,0.2,Iris-setosa22,5.1,3.7,1.5,0.4,Iris-setosa23,4.6,3.6,1.0,0.2,Iris-setosa24,5.1,3.3,1.7,0.5,Iris-setosa25,4.8,3.4,1.9,0.2,Iris-setosa26,5.0,3.0,1.6,0.2,Iris-setosa27,5.0,3.4,1.6,0.4,Iris-setosa28,5.2,3.5,1.5,0.2,Iris-setosa29,5.2,3.4,1.4,0.2,Iris-setosa30,4.7,3.2,1.6,0.2,Iris-setosa31,4.8,3.1,1.6,0.2,Iris-setosa32,5.4,3.4,1.5,0.4,Iris-setosa33,5.2,4.1,1.5,0.1,Iris-setosa34,5.5,4.2,1.4,0.2,Iris-setosa35,4.9,3.1,1.5,0.1,Iris-setosa36,5.0,3.2,1.2,0.2,Iris-setosa37,5.5,3.5,1.3,0.2,Iris-setosa39,5.5,4.2,1.4,0.2,Iris-virginica40,4.9,3.1,1.5,0.1,Iris-versicolor38,5.0,3.2,1.2,0.2,Iris-versicolor51,5.5,3.5,1.3,0.2,Iris-versicolor 下面是程序实现: import tensorflow as tfpath="/Users/shuubiasahi/Desktop/业务相关文档/iris.csv"def read_data(file_queue): reader=tf.TextLineReader(skip_header_lines=1) key,value=reader.read(file_queue) defaults=[[0], [0.], [0.], [0.], [0.], ['']] Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species = tf.decode_csv(value, defaults) preprocess_op=tf.case({ tf.equal(Species,tf.constant('Iris-setosa')):lambda :tf.constant(0), tf.equal(Species, tf.constant('Iris-versicolor')): lambda: tf.constant(1), tf.equal(Species, tf.constant('Iris-virginica')): lambda: tf.constant(2), },lambda :tf.constant(-1),exclusive=True) return tf.stack([SepalLengthCm, SepalWidthCm, PetalLengthCm, PetalWidthCm]), preprocess_opdef create_pipeline(filename,batch_size,num_epochs=None): file_queue = tf.train.string_input_producer([filename], num_epochs=num_epochs) example, label = read_data(file_queue) min_after_dequeue = 1000 capacity = min_after_dequeue + batch_size example_batch, label_batch = tf.train.shuffle_batch( [example, label], batch_size=batch_size, capacity=capacity, min_after_dequeue=min_after_dequeue ) return example_batch, label_batchx_train_batch, y_train_batch = create_pipeline(path, 5, num_epochs=1000)x_test, y_test = create_pipeline(path, 60)init_op = tf.global_variables_initializer()local_init_op = tf.local_variables_initializer() # local variables like epoch_num, batch_sizewith tf.Session() as sess: sess.run(init_op) sess.run(local_init_op) # Start populating the filename queue. coord = tf.train.Coordinator() threads = tf.train.start_queue_runners(coord=coord) # Retrieve a single instance: try: #while not coord.should_stop(): for _ in range(6): example, label = sess.run([x_test, y_test]) print (example) print (label) except tf.errors.OutOfRangeError: print ('Done reading') finally: coord.request_stop() coord.join(threads) sess.close()
转载请注明原文地址: https://www.6miu.com/read-58502.html

最新回复(0)