python与自然语言处理（六）：中文文本转图像

xiaoxiao2021-02-28 74

最近使用word2vec对文本进行向量化表示，然后模仿基于CNN的图像分类实现文本分类。既然是模仿图像，那就应该可以将文本用图像可视化一下，看看量化后的文本是什么样子。

python处理图像的基本模块是Image库，由于实验中使用的是python3，需要安装的图像处理库为Pillow （pip install Pillow / conda install Pillow）。词向量模型使用的是gensim的word2vec工具，具体操作见这里。

#-*-coding=utf-8-*- from gensim import models import numpy as np from PIL import Image text_width = 10 #load word2vec model word_vector_size =25 base_model_path = './word_vector_' modelpath = base_model_path+str(word_vector_size) emotion_model = models.Word2Vec.load(modelpath) #加载词向量模型 #得到字符向量 def getCharVec(char): vector = np.zeros(word_vector_size) if char in emotion_model: vector[0:word_vector_size] = emotion_model[char] else: #若词汇不在词向量模型中则按正态分布随机初始化 loc,scale = 0,0.5 #均值和标准差 vector[0:word_vector_size] = np.random.normal(loc,scale,word_vector_size) return vector #得到句子向量 def getSentenceVec(sentence): vectors = np.zeros((text_width,word_vector_size)) sentence_array = sentence.split(' ') for i in range(len(sentence_array)-1): vector = getCharVec(sentence_array[i]) vectors[i] = vector return vectors def drawPic(vectors,savepath): img = Image.fromarray(vectors) img.save(savepath) if __name__ == '__main__': text1 = '预告片里的服化道时而素雅时而华贵晓彤演绎的刘楚玉演技提高很值得期待' text2 = '这部剧的摄影师和剪辑师是不是已经做好了随时领盒饭的准备了' if len(text1.split())>text_width: text_width = len(text1.split()) if len(text2.split())>text_width: text_width = len(text2.split()) vectors = getSentenceVec(text1) print(vectors) drawPic(vectors,'./text1.tiff') vectors = getSentenceVec(text2) print(vectors) drawPic(vectors,'./text2.tiff')

运行后得到的图像分别为：

text1.tiff

text2.tiff

备注：之所以存储图像为tiff格式，是因为数组数据为float类型，直接存为png格式会报错 can't save mode 'F' image，故参考相关回答将格式改为tiff，运行成功。

转载请注明原文地址: https://www.6miu.com/read-2624857.html

技术

最新回复(0)