NLTK使用总结

xiaoxiao2021-02-28  107

nltk.tokenize.punkt() 这个class能将text拆分成句子,但是会保留标点符号,比如括号之类的 import nltk.data text = ''' ... Punkt knows that the periods in Mr. Smith and Johann S. Bach ... do not mark sentence boundaries. And sometimes sentences ... can start with non-capitalized words. i is a good variable ... name. ... ''' sent_detector = nltk.data.load('tokenizers/punkt/english.pickle') print('\n-----\n'.join(sent_detector.tokenize(text.strip()))) ''' ...Punkt knows that the periods in Mr. Smith and Johann S. Bach do not mark sentence boundaries. ----- '''
转载请注明原文地址: https://www.6miu.com/read-62176.html

最新回复(0)