Word 97-2003 document to HTML conversion - programatically [closed]?

Be careful when you consider using Office Client automation: Not only it doesn't scale well, be you can have a lot of problems with memory, macro, popup... From support.microsoft.com/kb/257757: Microsoft does not currently recommend, and does not support, Automation of Microsoft Office applications from any unattended, non-interactive client application or component (including ASP, ASP. NET, DCOM, and NT Services), because Office may exhibit unstable behavior and/or deadlock when Office is run in this environment. Other ways are: OpenXML (which I recommend if you can switch to Office 2007 documents) Using third party libs (like Aspose).

Thnaks, I went with Aspose... It works great! – Andrei Rinea Oct 3 '08 at 14:40.

I have come across WV which is a C library that reads some Microosoft Word formats (and is used for document conversion in AbiWord). It is in C so you could maybe run it as a daemon, or wrap it in COM and use interop. It looks lik eit would need at least a little work before it was useful to you.

You could also look at whether OpenOffice.org has a separate converter component.

COM Interop with the Word application is out of the question because this doesn't scale AT ALL. I have tried to generate XLS files and it takes seconds on a multi-core servers per small file. Plus it requires you to have Office installed on the server and sometimes you could have modal dialogs pop up on the server during conversion/interoperability which would hang all processes related to office.

You could have mentioned the server requirement in your question. – Daemin Oct 2 '08 at 13:10 Ooops.. :"> I'll add it now.. – Andrei Rinea Oct 2 '08 at 13:19.

I don't know of any conversion libraries specifically, but you could try using Word's COM interface and a language that can use COM to "parse" the document and extract out the text and formatting. It doesn't even have to be C#, I'd use something like Ruby because it's quick to develop in and has reasonable COM interaction capabilities. There should also be a way of using COM to activate Word's own "Save as HTML" option for the document.

Found this documentation that provides what you essentially need. Just use the save method on that object and specify the format to be a HTML variant. Apart from using COM there's a Java based library POI that interacts with Microsoft formats.It's not .

Net but it's something else that we use at work to interact with excel files (the other being COM).

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions