HighDots Forums  

Recommend a utility to clean up msOffice HTML, but leave certain basic formatting?

Websites/HTML pages critique & reviews Discuss and review existing WWW material (alt.html.critique)


Discuss Recommend a utility to clean up msOffice HTML, but leave certain basic formatting? in the Websites/HTML pages critique & reviews forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
ship
 
Posts: n/a

Default Recommend a utility to clean up msOffice HTML, but leave certain basic formatting? - 04-14-2007 , 08:44 AM






Hi

Can anyone recommend a really powerful utility to strip out Microsoft
Word (2003) garbage from HTML?

I need to keep the basics of formatting i.e.
- Bolds
- Italics
- Bullets

Plus I need to keep the very basic table tags
<TABLE>, <TR>, <TD>
so that I can keep the essential table structure.

But I need EVERYTHING ELSE to get stripped out.
I dont want any classes of any sort, no fonts and no weird Micro$oft
stuff.

I also want to be able to specify certain character string that I want
to get
either converted or stripped out entirely.
e.g. remove "&nbsp;" completely,
remove "<p></p>" completely
change weird opening inverted commas into basic ASCII character: '


I have searched high and low for such a utility!

The only thing that comes close so far is DETAGGER (from jafsoft.com).
But it's very clunky and wont let me so multiple different
replacements
at the touch of a single click!

Any (low cost/free) suggestions?

With thanks


Ship
Shiperton Henethe


Reply With Quote
  #2  
Old   
Edwin van der Vaart
 
Posts: n/a

Default Re: Recommend a utility to clean up msOffice HTML, but leave certainbasic formatting? - 04-14-2007 , 03:44 PM






ship wrote:
Quote:
Hi

Can anyone recommend a really powerful utility to strip out Microsoft
Word (2003) garbage from HTML?
Sorry I can't help you with that.
But why are you using Word to make webpages?
There are a lot of html-editors like:
HTML editors for free:
jedit : http://www.jedit.org/
nedit : http://www.nedit.org/
ewisoft : http://www.ewisoft.com/
netpadd : http://www.netpadd.com/
araneae : http://www.ornj.net/software/araneae/
1st page : http://www.evrsoft.com/
crimson : http://crimsoneditor.com/
ezpad : http://www.mmedia.is/ezpad/
acehtml :
http://software.visicommedia.com/en/...ehtmlfreeware/
notetab light : http://www.notetab.com/
html-kit : http://www.chami.com/html-kit/
context : http://www.fixedsys.com/context/
pspad : http://www.pspad.com/en/index.html
websmill : http://www.xtreeme.com/websmill/
metapad : http://www.liquidninja.com/metapad/
quanta (linux) : http://freeware.acehtml.com/
tswebeditor : http://tswebeditor.net.tc/
notespad : http://www.newbie.net/NotesPad/index.html
grey matter pro : http://www.pagetutor.com/misc/grey.html
editpad lite : http://www.editpadlite.com/editpadlite.html
stones webwrite : http://www.webwriter.dk/english/index.htm
matizha sublime :
http://www.dohnews.com/index.php?mod...&ceid=3&meid=4

nvu : http://www.nvu.com/
SciTE : http://scintilla.sourceforge.net/SciTE.html
Notepad++ : http://notepad-plus.sourceforge.net/uk/site.htm
CSE HTML Validator : http://www.htmlvaildator.com/lite/
Through the web editor : http://koivi.com/WYSIWYG-Editor/
Xinha : http://xinha.python-hosting.com/
FCK editor : http://www.fckeditor.net/
vim : http://www.vim.org/download.php

HMTL editors not for free:
textpad : http://www.textpad.com/
notetab : http://www.notetab.com/
editplus : http://www.editplus.com/
ultraedit : http://www.idmcomp.com/
editpad : http://www.editpadpro.com/
hypertext studio : http://www.olsonsoft.com/
namo : http://www.namo.com/products/webeditor/
acehtml pro : http://www.visicommedia.com/acehtml/
ibm websphere : http://www-3.ibm.com/software/webservers/hpbuilder/
spider writer : http://www.actiprosoftware.com/Products/SpiderWriter/
Zues : http://www.zeusedit.com/
CSE HTML Validator pro : http://www.html.validator.com/

PHP editors for free
phpedit : http://phpedit.org/
Winsyntax : http://www.dirfile.com/arisesoft_winsyntax.htm
devphp : http://devphp.sourceforge.net/
phpcoder : http://www.phpide.com/programs.php
Davor's PHP Editor : http://www.pleskina.com/dphped/main.php
php designer : http://www.mpsoftware.dk/

PHP editor not for free:
phped : http://www.nusphere.com/products/
top php studio : http://www.top-systems.net/
dzsoft php editor : http://www.dzsoft.com/dzphp.htm
Expert Editor : http://www.ankord.com/phpxedit.html
komodo : http://www.activestate.com/Products/Komodo/
maguma studio/workbench : http://www.maguma.com/

PHP editor comming soon:
HydraPHP : http://www.coldmind.com/

XML editor for free:
xmlpro2 : http://www.vervet.com/
cooktop : http://www.xmlcooktop.com/
xray : http://architag.com/xray/
peters xml eiditor : http://www.iol.ie/~pxe/
morphon : http://www.morphon.com/xmleditor/index.shtml

XML editor not for free:
xopus : http://xopus.com/
editml : http://www.editml.com/
xmlwriter : http://www.xmlwriter.net/
oxygen pro : http://www.oxygenxml.com/
blueprint : http://www.xmlblueprint.com/
xmlspy : http://www.xmlspy.com/products_ide.html
turboxml : http://www.tibco.com/software/busine...n/turboxml.jsp

XML editor for free and not free
xmlmind : http://www.xmlmind.com/xmleditor/
--
Edwin van der Vaart
http://www.semi-conductor.nl/ Links to Semiconductors sites
http://www.evandervaart.nl/ Edwin's persoonlijke web site
Explicitly no permission given to Forum4Designers, onlinemarketingtoday,
24help.info, issociate.de, velocityreviews, umailcampaign.com,
gthelp.com, webfrustration.com, excip.com and many other to duplicate
this post.


Reply With Quote
  #3  
Old   
Mumia W.
 
Posts: n/a

Default Re: Recommend a utility to clean up msOffice HTML, but leave certainbasic formatting? - 04-15-2007 , 03:33 AM



On 04/14/2007 07:44 AM, ship wrote:
Quote:
Hi

Can anyone recommend a really powerful utility to strip out Microsoft
Word (2003) garbage from HTML?
[...]
At one time I had to use HTML Tidy¹ to strip out MS extraneous tags and
attributes; however, nowadays I'd use Perl along with HTML::Parser.

-------------------
¹ http://www.w3.org/People/Raggett/tidy/

--
Count the YOYOs:
http://home.earthlink.net/~mumia.w.18.spam/games_fever/


Reply With Quote
  #4  
Old   
andrew
 
Posts: n/a

Default Re: Recommend a utility to clean up msOffice HTML, but leavecertain basic formatting? - 04-15-2007 , 03:43 AM



On Sat, 14 Apr 2007 21:44:34 +0200
Edwin van der Vaart <e.vandervaart (AT) want (DOT) nospam.com> wrote:

[...]

Quote:
HTML editors for free:
[...]
Quote:
http://www.liquidninja.com/metapad/ quanta
(linux)
[...]

Hi Edwin,

Great list!! I suspect that you have this filed somewhere so I hope
you don't mind if I suggest another 2 great free Linux HTML editors
to add to your list:

Bluefish: http://bluefish.openoffice.nl/
Screem: http://www.screem.org/

Quanta I am not sure about. Do you mean Quanta Plus which is found
at: http://quanta.kdewebdev.org/?

I am a huge fan of BlueFish which has the competitive twin-brother
Screem :-) I have not tried Quanta Plus as it is meant for KDE
Desktop.

All the best,

Andrew

--
Andrew's Corner
http://people.aapt.net.au/~adjlstrong/


Reply With Quote
  #5  
Old   
Edwin van der Vaart
 
Posts: n/a

Default Re: Recommend a utility to clean up msOffice HTML, but leave certainbasic formatting? - 04-15-2007 , 08:49 AM



andrew wrote:
Quote:
On Sat, 14 Apr 2007 21:44:34 +0200
Edwin van der Vaart <e.vandervaart (AT) want (DOT) nospam.com> wrote:

[...]

HTML editors for free:

[...]
http://www.liquidninja.com/metapad/ quanta
(linux)

[...]

Hi Edwin,

Great list!! I suspect that you have this filed somewhere so I hope
you don't mind if I suggest another 2 great free Linux HTML editors
to add to your list:

Bluefish: http://bluefish.openoffice.nl/
Screem: http://www.screem.org/
Just added to the editor list.

Quote:
Quanta I am not sure about. Do you mean Quanta Plus which is found
at: http://quanta.kdewebdev.org/?
The link to quanta is changed. At first it was quanta, but quanta plus
will do.
Thank you for pointing the broken/changed link.

Quote:
I am a huge fan of BlueFish which has the competitive twin-brother
Screem :-) I have not tried Quanta Plus as it is meant for KDE
Desktop.
When I make a webpage, I use notepad (windows), vi (linux) or joe (linux).
--
Edwin van der Vaart
http://www.semi-conductor.nl/ Links to Semiconductors sites
http://www.evandervaart.nl/ Edwin's persoonlijke web site
Explicitly no permission given to Forum4Designers, onlinemarketingtoday,
24help.info, issociate.de and software-help1.org to duplicate this post.


Reply With Quote
  #6  
Old   
edgy
 
Posts: n/a

Default Re: Recommend a utility to clean up msOffice HTML, but leave certainbasic formatting? - 04-15-2007 , 03:38 PM



ship wrote:
Quote:
Hi

Can anyone recommend a really powerful utility to strip out Microsoft
Word (2003) garbage from HTML?

I need to keep the basics of formatting i.e.
- Bolds
- Italics
- Bullets

Plus I need to keep the very basic table tags
TABLE>, <TR>, <TD
so that I can keep the essential table structure.

But I need EVERYTHING ELSE to get stripped out.
I dont want any classes of any sort, no fonts and no weird Micro$oft
stuff.

I also want to be able to specify certain character string that I want
to get
either converted or stripped out entirely.
e.g. remove "&nbsp;" completely,
remove "<p></p>" completely
change weird opening inverted commas into basic ASCII character: '
Just copying the text from Word, opening NVU and selecting "paste
without formatting"(or something like that) has worked well for me when
faced with this problem.

Other than that, 10 minutes of search and replace with notepad can
usually clean up your code pretty good - just leave the "replace with"
box blank!


Reply With Quote
  #7  
Old   
Mhask
 
Posts: n/a

Default Re: Recommend a utility to clean up msOffice HTML, but leave certain basic formatting? - 05-01-2007 , 06:03 PM




"ship" <shiphen (AT) gmail (DOT) com> wrote

Quote:
Hi

Can anyone recommend a really powerful utility to strip out Microsoft
Word (2003) garbage from HTML?

I need to keep the basics of formatting i.e.
- Bolds
- Italics
- Bullets

Plus I need to keep the very basic table tags
TABLE>, <TR>, <TD
so that I can keep the essential table structure.

But I need EVERYTHING ELSE to get stripped out.
I dont want any classes of any sort, no fonts and no weird Micro$oft
stuff.

I also want to be able to specify certain character string that I want
to get
either converted or stripped out entirely.
e.g. remove "&nbsp;" completely,
remove "<p></p>" completely
change weird opening inverted commas into basic ASCII character: '


I have searched high and low for such a utility!

The only thing that comes close so far is DETAGGER (from jafsoft.com).
But it's very clunky and wont let me so multiple different
replacements
at the touch of a single click!

Any (low cost/free) suggestions?

With thanks


Ship
Shiperton Henethe

I've used:

Doc Scrubber v1.1
http://www.docscrubber.com/download.html

Doc Scrubber is provided as freeware for personal and educational use.

This doesn't do a perfect job, but it does cut out a lot of the "dead wood"
and the price is right...




Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.