Python 简单应用--文章字数统计

xiaoxiao2021-02-27 232

Python是做数据处理很好的工具，这里小试牛刀，用Python完成文章的字数统计。

系统：Ubuntu16.04

Python版本：3.4

文本：《西游记》txt片段

结果：存放于result.csv 中

# 下面两句可以查看使用的字符编码，结果为：utf-8 import sys print (sys.getdefaultencoding()) fw = open('data.txt.utf8','r') # character列表：存储所有出现的汉字 # stat字典：汉字为key值，出现次数为value值 characters = [] stat = {} for line in fw: line = line.strip() # 如果某一行去掉空格没有内容，则这一行不做处理 if len(line) == 0: continue for x in range(0,len(line)): # 暴力列举可能出现的标点符号，统计汉字的时候跳过这些符号 if line[x] in [' ','\n','\t','，','。','？','《','》','！','、','：','“','”','；']: continue # 如果当前汉字没有在character列表中，则加入character列表 if not(line[x] in characters): characters.append(line[x]) # 判断stat字典中是否含存在当前汉字，如果不存在，则将此汉字加入stat字典，其value值赋 0 # python2的版本： if not(stat.has_key(line[x])): if not (stat.__contains__(line[x])): stat[line[x]] = 0 # 在stat字典中，使当前汉字的统计数 +1 stat[line[x]] += 1 fw.close() # print the result print(characters) for key,value in stat.items(): print(key,value) # 查看character和stat的长度，即里面含有的元素个数 print('characters列表的长度：' + str(len(characters))) print('stat字典的长度：' + str(len(stat)))

输出结果：

因为数据太多，显示不方便，所有做一下简单的数据处理：

# 做一些简单的数据处理 # 将stat字典转换为列表，该列表以value值降序排列 stat = sorted(stat.items(),key = lambda d:d[1],reverse = True) # 输出此时stat的类型，以及长度 print(type(stat),len(stat)) # 输出character列表中前十个汉字 for x in range(1,10): print (characters[x]) print('******************************') # 输出stat列表中前十个数据 for x in range(1,10): print (stat[x][0], stat[x][1]) # 将数据结果保存在CSV文件中 fw = open('result.csv','w') # 涉及到数据类型的转换 for item in stat: fw.write(item[0] + ',' + str(item[1]) + '\n') fw.close()

转载请注明原文地址: https://www.6miu.com/read-9565.html

技术

最新回复(0)