Copy from MS Word, Paste into a Rich Text WYSIWYG editor

This title will send chills up the spine of web developers & content authors everywhere.  Web Developers fear the bloated markup caused by this action.  Content authors fear the difficulty of mixing their favorite authoring environment with their CMS’s editor.

Why is copy & paste a problem on the web?

The problem isn’t copy & paste.  The problem is WHAT is being copied & pasted.

Plain text (content without any styling) is completely safe to paste into a Rich Text editor.  However, rich text content consists of 1) text and 2) styling.

This sentence has a bolded word.

In this example, my Rich Text editor added hidden markup around the word “bolded”.  This markup instructs the web browser to apply special styling.  If this content is copied & pasted into another program then this hidden styling is included.

And despite what you think, this isn’t what you want…

MS Word is not good at creating web sites

There are plenty of choices for accessing the web (PC, Mac, phones, iPad, IE, Chrome, Firefox, Opera, etc).  Ideally, a web site needs to function reliably in all of these environments.

To address this challenge, web developers establish styling for the entire web site.  This styling, in addition to creating a consistent visual experience, enables the web site to be adapted for each device or browser.

By importing styling from MS Word authors are circumventing their web site’s styling.

Insidious hidden styling that accompanies copy & paste actions from MS Word

As a result, styling that worked wonderfully in one environment (MS Word) will behave very poorly in another environment (your web site).   Even if it looks okay during publishing, this imported styling will create insidious long-term issues for the web site.

What’s the solution to copy & paste?

As described above, the embedded styling (found in copy & pasted content) is the problem.  Consequently, the solution is simple and obvious:

Copy the text, but remove the styling.

Towards this end, special ‘paste’ buttons are popular with many Rich Text editors:

8 icons devoted to copying & pasting in a Rich Text WYSIWYG editor

However, this is a ridiculous waste of toolbar real estate.  The Rich Text editor should automatically clean pasted content.  The alternative is educating end-users regarding which of these 8 buttons they should click.

All major Rich Text solutions (TinyMCE, CKEditor, RadEditor) have options for automatically cleaning pasted rich text content.

This solution has a downside though:

When styling is removed the content will look radically different.  This requires content authors to reapply missing styling within the Rich Text editor.  By doing this, authors are replacing MS Word styling with web friendly styling.

This solution is unrealistic, content authors will revolt

Everything I’ve written is well known to developers.  Furthermore, features for automatically detecting and cleaning dirty content are widely available.

However, these features are often disabled in the face of user revolt.

Content authors will revolt if you strip away their MS Word styling (or remove font colors)

It’s normal for content authors to react negatively when their nicely formatted MS Word document turns to garbage in the CMS.  These reactions are given credibility since their actions worked fine in another CMS or Rich Text editor.

So…just disable the feature that strips MS Word styling and make them happy…

This will eventually ruin the web site, but the customer is always right.  Right?

Is Clippy the solution to our problems?

This post has now come full circle and we’re no closer to a real-world solution:

  1. Developers remove pasted styling to protect the web site
  2. Authors create content in their preferred writing environment.
  3. Authors want to move this content to the web site.
  4. Copy & paste is a logical choice.
  5. Authors are confused when everything goes to hell.
  6. Authors complain to developers.
  7. Developers allow pasted styling to make authors stop complaining.

However, as I look over this cascade of events, I see an opportunity for intervention at stage #5.  Education (as much as technology) is the problem.

To address this, here is what I propose:

Rich Text dialog window when pasting from MS Word.  No, it's not like Clippy.

I was chatting with a colleague about this dilemma and showed him this mockup.  He replied with “you want Clippy” and then smiled.  This reply severely shook my faith in my proposal.  I certainly have no desire to interact with Clippy…

Microsoft Clippy - Alive, well and now in your CMSHowever, there is a lot I like about this proposal:

  • It doesn’t involve an animated character
  • It empowers authors to make their own choice
  • It educates authors about the consequences
  • It only displays when relevant
  • It contains useful information
  • It will go away

None of these things could be said about Clippy.

If you build it, they will come!

Everything described happens because authors avoid writing content in their CMS’s Rich Text editor.  The hacky style stripping & modal windows are completely unnecessary if authors simply type the content in the CMS.

Towards that end, I’m very interested in creating an attractive web-based authoring experience.  Why are authors avoiding web-based authoring tools in favor of off-line tools? How can we change this behavior?

There are some big players (Google Documents, Word Live) that are also wrestling with this challenge.  This topic is covered in another post.

This entry was posted in Uncategorized and tagged . Bookmark the permalink.
  • http://ckeditor.com FredCK

    Interesting article Gabe. Actually, we are following your blog with great pleasure recently.

    CKEditor already does the Word cleanup automatically, either on CTRL+V or through the toolbar “Paste”. There are people that still want to the “Paste from Word” option though, so we have the dedicated button there.

    We’re considering merging all paste buttons into a single one, which defaults to “Paste” but can also do “Paste from Word” and “Paste as Plain Text”. This looks like a good solution for the mess, don’t you think?

  • http://www.smartoarif.com Arif

    Here the problem I came up with CKEditor (actually I saw it in all editor I’ve tried – that does the cleaning) is – they remove my real formattings and styles; only keep a few of them. Like I’ve some centered, bold & a bit larger font size text in a docx file. After I copy-paste these to ckeditor it lost my actual font size; not sure if there is any special case with my doc. I can send the doc if you are interested to….

    note: this is not happening, when editor does not cleanup the content pasted from word

    Any ideas or help?

  • Kevin White

    The classic perfect-world solution vs. real-world results. Unfortunately, I think any solution that relies on the user making the correct, best solution is working on a flawed premise. People are lazy and are generally going to choose the path of least resistance.

    My gut feel is 90% of our users will choose “Leave it Alone”. 50% of those people will then end up contacting us to fix the mysterious problems that are occuring on their website (the other 50% are the ones that don’t even bother to look at their content on the actual page after they hit “Save”). I also anticipate a need to provide an interface for undoing the “Remember” selection, as they don’t realize what is really going to happen to their content after they make their selection (and will not wait to try it once to find out).

    That said, I still think this is a step in the right direction and is a better solution/start than the current “State of the Paste”.

    The better solution (and I am certainly not volunteering to build it, just to use it after the smart minds make it), would seem to be to automatically detect the pasting of Word content and fix it, keeping a vast majority of the styling in place (just replacing it with the correct HTML/CSS styling). To the user, nothing has happened. They hit paste and the stuff appears in the editor with no visible difference from the way it looked in Word.

  • http://www.limra.com Eric Crawford

    Experienced content folks I work with tend to ignore all the special copy/paste options and do this for consistency:

    1 copy/ paste into notepad to remove stylejunk
    2 copy/paste from notepad into HTML window
    3 Reapply necessary tags/style.

    If you want to create a fancy paste styled text button, it should be customizable or just bring in basic tags like , <a>, <strong>, etc. and nothing more (especially without the tags everywhere to apply and reapply font-size, font-weight, font-face, etc.)

    • http://www.limra.com Eric Crawford

      Oops, my post lost the tags I wanted to include: <a> <p> <strong>

      • Gabe Sumner

        Hey Eric, thanks for the comment. I fixed your post. I love that (even here) we’re struggling with the editor. :)

  • http://www.cmscritic.com Mike Johnston

    Hi Gabe,

    I think you’ve made some excellent points here and really hit the mark on this topic. This is definitely a frustration point for alot of users, including myself. I would suggest one addition, however. You mentioned “Even if it looks okay during publishing, this imported styling will create insidious long-term issues for the web site.” but didn’t actually highlight what those issues may be.

    That information could be valuable to people unaccustomed to these issues and I’d suggest you add some examples. Great article, thanks for letting me know about it.

    ~Mike

    • http://gabesumner.com Gabe Sumner

      Hey Mike, thanks for the comment.

      Regarding the long-term issues of copy & paste, I had about 2 paragraphs of content typed that I removed because it felt like a large tangent in the midst of the article. I was planning to turn “insidious long-term issues” into a link [later] to another article.

      Here is one example though:

      Web site redesigns are very challenging when the web site is littered with in-line styling. This task will involve messy site-wide search & replace operations. Whereas external styling makes this a very quick site-wide change.

      I’ll try to touch on this subject in a future post. Thank you so much for commenting.

  • Pingback: WYSIWYG Rich Text Editors: Your CMS’s Achilles’ heel | Gabe Sumner

  • Renaudgarnier

    Thanks for these thinks everybody. I’m working with CKeditor these days to allow visitors of my site to copy and paste there movie script on the site. These documents are not so complicated, I just need to keep a margin-left for dialogs. Other settings could change, it’s not my primary trouble. 

    The thing that’s surprising me today is the difference between web browser. I use to check my site development and changes with Safari 5.1 on a Mac and since this morning I can copy from Pages or MSWord and Paste to CKEditor without losing any style of my document… So I keep encoding the site during hours when I just realize that this fact is an exception. Chrome doesn’t keep the style and mozilla doesn’t paste at all !! Even with the PasteFromWord option !!! So… I’ve tried this on Windows with Safari,Chrome,Internet Explorer and Mozilla… Same thing, mozilla doesn’t accept the paste and the 3 others lose the style of my page.

    So I first tried to find out a browser accepting paste and style to advise on the different OS, but I can’t tell users to download a browser just for a paste option. Uploading msword documents to convert them to HTML looks like a tragedy on Google searches. 

    Now, I’m just very confused, I’m not enough experienced to find the easier solution (if one is easier !) between working on a script to convert .doc to .html on server side OR trying to understand why paste is not working the same on different browsers !