https://blog.csdn.net/u0/article/details/
思路:
1.读入文件,按行将文字拼接成字符串str
2.用正则过滤字符串中的标点,再分割成str[]
3.用hashmap依次统计每个单词出现的次数(可以加黑名单过滤情态动词等)
4.对hashmap的值排序(利用Collections的sort,重写比较器Comparator的compare)
5.输出hashmap前10个单词
代码:
public class Main{
public static void main(String[] args) throws IOException { readPaper(); } //统计一篇英文文章中出现次数最多的10个单词 public static void readPaper() throws IOException{ HashMap<String, Integer> wordMap = new HashMap<String, Integer>(); File file = new File("e:/info.log"); BufferedReader br=new BufferedReader(new FileReader(file)); StringBuilder sb=new StringBuilder(); String line=null; while((line=br.readLine())!=null){ sb.append(line); } br.close(); String words=sb.toString();// 全部的单词字符串 String target=words.replaceAll("\\pP|\\pS", "");// 将标点替换为空 //小写 p 是 property 的意思,表示 Unicode 属性,用于 Unicode 正表达式的前缀 //大写 P 表示 Unicode 字符集七个字符属性之一:标点字符 //大写S:符号(比如数学符号、货币符号等); String[] single=target.split(" "); String[] keys={ "you", "i", "he", "she", "me", "him", "her", "it", "they", "them", "we", "us", "your", "yours", "our", "his", "her", "its", "my", "in", "into", "on", "for", "out", "up", "down", "at", "to", "too", "with", "by", "about", "among", "between", "over", "from", "be", "been", "am", "is", "are", "was", "were", "whthout", "the", "of", "and", "a", "an", "that", "this", "be", "or", "as", "will", "would", "can", "could", "may", "might", "shall", "should", "must", "has", "have", "had", "than" }; // 将一部分常见的无意义的英语单词替换为字符 '#' 以便后面输出单词出现次数时的判断 // for(int i=0;i<single.length;i++){
// for(String str:keys){
// if(str.equals(str)){
// single[i]="#"; // } // } // } // 将单词以及其出现的次数关联起来 for(int i=0;i<single.length;i++){ if(wordMap.get(single[i])==null){ wordMap.put(single[i],1); }else{ wordMap.put(single[i], wordMap.get(single[i])+1); } } //比较器,按值排序 List<Entry<String,Integer>> list=new ArrayList <Entry<String,Integer>>(wordMap.entrySet()); Collections.sort(list,new Comparator<Entry<String,Integer>>(){ @Override public int compare(Entry<String, Integer> o1, Entry<String, Integer> o2) { return o2.getValue()-o1.getValue(); } } ); //输出次数最多的单词 int count=1; for(Map.Entry<String, Integer> entry:list){ if(entry.getKey().equals("#")){ continue; } System.out.println(entry.getKey()+":"+entry.getValue()); count++; if(count==11){ break; } } }
讯享网

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容,请联系我们,一经查实,本站将立刻删除。
如需转载请保留出处:https://51itzy.com/kjqy/126571.html