word2html

Costas

Administrator
Staff member
A frontend for mwilliamson.dotnet-mammoth dll... The mission is to convert a .docx file to .html, without the microsoft inline css styles and retain the .docx format. Till now the most people tried to clean the word html, from the other side dotnet-mammoth dll reads the .docx and write the .html.

Keep in mind all images embedded as base64 to output html. If you find the html 'heavy' rename the .docx to .zip, go inside zip to folder \word\media. Before do the conversation fix the image into .docx, make use of 'batch convert' of freeware RIOT (tip the quality set to main screen inherited to batch convert).

Because the multiple word versions & styles and also user customization styles, the dotnet-mammoth dll is unable to identify them.
Application has two stages, the first one signalize which styles not found (aka on export will appear as normal text and not as heading). The second one will do the actual conversation. To achieve this, any warning is 'Unrecognised paragraph style' on first step added to right list, user is able to modify the list or un/check if the unknown styles will be processed as known.

Application works also on batch mode, user is able to convert multiple .docx at once. Add files by drag & drop to main list.


source docx without imagesword filtered html exportword2html export
32.7kb60kb19kb

word2html.png


Platform : C# 2012
Operating System :
windows1.gif
win7 /
windows1.gif
win10 (32bit)
Requirements : v4.5 .NET framework
Filesize : 69kb

Download

Source code

reference - My tiny side project has had more impact
 
Top