public String readDoc(File file) {
        StringBuffer buffer = new StringBuffer();
        InputStream input = null;
        WordExtractor extractor = null;
        String[] paragraphs = null;
        try {
            input = new FileInputStream(file);
            extractor = new WordExtractor(input);
            paragraphs = extractor.getParagraphText();
            for (String paragraph : paragraphs) {
                buffer.append(extractor.stripFields(paragraph)).append("\\
");
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            if (input != null) {
                try {
                    input.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
        return buffer.toString();
    }
    

剔除方法:extractor.stripFields(paragraph);

提取文档内容文章。excel,pdf,word…..

http://blog.sina.com.cn/s/blog_67b9ad8d01010bwa.html

出现问题文章:

http://bbs.csdn.net/topics/320055955