read word doc into string

Aug 30, 2012 at 2:47 PM

Hi all i have the following code 

 public string ReadAllTextFromWordDocFile(string fileName)       
{           
using (StreamReader streamReader = new StreamReader(fileName))
{              
var document = new HWPFDocument(streamReader.BaseStream);           
//var document2 = new XWPFDocument(streamReader.BaseStream);           
var wordExtractor = new WordExtractor(document);       
var docText = new StringBuilder()
foreach (string text in wordExtractor.ParagraphText)       
{           
docText.AppendLine(text.Trim());       
}       
streamReader.Close();       
return docText.ToString();   
}
}

I have also referenced the following

using NPOI;using
NPOI.POIFS.FileSystem;

but I get the error The type or namespace name 'HWPFDocument' could not be found

can anyone explain where im going wrong?

thanks

Coordinator
Sep 15, 2012 at 10:14 PM

HWPFDocument is not included in any official release because HWPF is very unstable so far. Anyway, you can find the source code from ScratchPad and compile by yourself.

Jul 2, 2013 at 7:22 AM
not included?
It is mentioned in a post at http://stackoverflow.com/questions/9672461/trying-to-read-an-ms-office-document
can't understand what is meant by following "Anyway, you can find the source code from ScratchPad and compile by yourself."
Is the library supports these or not
Coordinator
Nov 17, 2013 at 11:20 PM
It supports it but not in official release. ScratchPad namespace is for test purpose. Any source code in it is not stable.
Apr 15 at 1:37 PM
Hi, is HWPFDocument going to be included in the official release at some point, or is there another way we can use NPOI to extract text from word 2003 documents?
Coordinator
Wed at 5:53 AM
We are going to include it in NPOI 2.3. But we need to make it stabler first.