[Java] Spam judgment using morphological analysis "lucene-gosen"

What i did

As a first step for machine learning, I tried spam judgment using morphological analysis (development language is Java, play framework is used)

What is lucene-gosen

Java morphological analysis tool, available just by dropping the jar

Download the jar from the following site https://code.google.com/p/lucene-gosen/

Reading and writing the file you want to analyze

public static void writing(File readfile,File writefile) throws IOException, FileNotFoundException{
	StringTagger stirngTagger = SenFactory.getStringTagger(null);
		
		Reader reader = new InputStreamReader(new FileInputStream(readfile), "UTF-8");
		StreamTagger tagger = new StreamTagger(stirngTagger, reader);
		FileWriter filewriter = new FileWriter(writefile);
		BufferedWriter bw = new BufferedWriter(filewriter);
		
		while (tagger.hasNext()) {
			Token token = tagger.next();
			bw.write(token.getSurface());
			bw.newLine();
		}
		bw.close();
	}

Count the divided words and sort them in descending order of appearance.

public class Wordseparated {
		public CountTable count(String readfile,String writefile) throws IOException, FileNotFoundException{
		
			CountTable table = new CountTable();		
			BufferedReader brfile = new BufferedReader(new FileReader(readfile));			
			BufferedWriter bwfile = new BufferedWriter(new FileWriter(writefile));

			while (true) {
		                String linefile = brfile.readLine();
				if (linefile == null) {
					break;
				}
				for (String s : linefile.split("\\s+")) {
					if (!s.equals("")) {
						int count = table.get(s);
						table.add(s);
					}
				}
			}
			brfile.close();
			
			for (String s : table.getKeysByCount()) {
				int count = table.get(s);

				bwfile.write(s);
				bwfile.newLine();
			}
			bwfile.close();
			return table;
		}
}

Recommended Posts

[Java] Spam judgment using morphological analysis "lucene-gosen"
NLP4J [001b] Morphological analysis in Java (using kuromoji)
Morphological analysis in Java with Kuromoji
Sorting using java comparator
All analysis using Javassist
[Java] Holiday judgment sample
Scraping practice using Java ②
Scraping practice using Java ①