在开源中文analysis,我选择了paoding analysis,link:http://code.google.com/p/paoding 配置如下: 在CLASSPATH 上面加入:E:\eclipse\paoding-analysis.properties 在用户变量加入:PAODING_DIC_HOME=》E:\dic 把dic文件,就是字词文本文件放到对应的目录。 测试代码:
Java代码 package test; import java.io.IOException; import java.io.Reader; import java.io.StringReader; import net.paoding.analysis.analyzer.PaodingAnalyzer; import net.paoding.analysis.analyzer.PaodingTokenizer; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.Token; public class Test1 { public static void main(String[] argv){ Analyzer analyzer = new PaodingAnalyzer(); String testString = "中华人民共和国"; Reader r = new StringReader(testString); PaodingTokenizer ts = (PaodingTokenizer) analyzer.tokenStream("", r); Token t; try { while((t = ts.next()) != null){ System.out.println(t); } } catch (IOException e) { e.printStackTrace(); } } } package test; import java.io.IOException; import java.io.Reader; import java.io.StringReader; import net.paoding.analysis.analyzer.PaodingAnalyzer; import net.paoding.analysis.analyzer.PaodingTokenizer; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.Token; public class Test1 { public static void main(String[] argv){ Analyzer analyzer = new PaodingAnalyzer(); String testString = "中华人民共和国"; Reader r = new StringReader(testString); PaodingTokenizer ts = (PaodingTokenizer) analyzer.tokenStream("", r); Token t; try { while((t = ts.next()) != null){ System.out.println(t); } } catch (IOException e) { e.printStackTrace(); } } }结果:
Java代码 (中华,0,2) (华人,1,3) (人民,2,4) (共和,4,6) (共和国,4,7) log4j:WARN No appenders could be found for logger (net.paoding.analysis.knife.PaodingMaker). log4j:WARN Please initialize the log4j system properly. 相关资源:paoding的maven工程及jar包