This week, I committed WebKit changes r92823
They’re perhaps the most important changesets I’ve ever committed to the WebKit codebase
because these changesets made WebKit no longer produce wrapping style spans
on copy and paste and
In fact, these are two changes I’ve always wanted to make ever
since I started working on the WebKit’s editing component in the summer of 2009.
Introduction to Apple style spans
An Apple style span is a HTML span element with the class
It is created whenever WebKit applies style on text by CSS.
document.execCommand('HiliteColor', false, 'blue'); may produce:
<span class="Apple-style-span" style="background-color: rgb(0, 0, 255);">hello world</span>
The initial intent of this was so that WebKit can avoid removing or modifying elements created by the authors and meant to stay by differentiating spans added by WebKit itself and those created by the authors.
We also use an Apple style span to wrap the copied contents to preserve
the style of the copied content.
If you copy
hello world on this page, for example, WebKit puts the following markup
into the pasteboard on Mac:
<meta charset='utf-8'><span style="color: rgb(81, 96, 100);font-family: 'Open Sans', Helvetica, Meiryo, sans-serif; font-size: 16px; font-style: normal;font-variant: normal; font-weight: 300; letter-spacing: normal;line-height: 27px; orphans: 2;text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: 2;word-spacing: 0px; -webkit-text-decorations-in-effect: none;-webkit-text-size-adjust: auto;-webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); ">hello world</span>
Problems with Apple style spans
However, avoiding the modification of spans not created by WebKit turned out to be ineffective
at best because the editing component had to add and remove so many other elements
and WebKit also had to work with elements generated by other browsers and CMS editors.
Also, avoiding the removal of spans without
class="Apple-style-span" caused the markup
to get progressively verbose over time because sometimes we had to cancel
the style added by those elements
<code><b><span style="font-weight: normal;">unbolded text</span></b></code>).
This was particularly apparent on mail clients that used WebKit as the editor
such as Apple’s Mail or Gmail (if the user happens to use a WebKit-based browser).
In some case, an e-email consisting of three lines of text consumed 3MB in HTML
because of nested spans created by WebKit and other mail clients.
An Apple style span that wraps the copied contents can get far worse
if the copied contents include block nodes.
Consider the following markup which annotates
This is title to be a level-1 header:
<h1>This is title</h1>
This is title is copied, WebKit puts the following markup into the pasteboard:
<meta charset='utf-8'><span style="color: rgb(0, 0, 0); font-family: Times; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><h1>This is title</h1></span>
Notice that the h1 is wrapped in a span!
In addition, WebKit used to wrap contents in two spans to retain
the document’s style separately prior to r86983.
font-family: sans-serif was set on the body element
and therefore stored in a separate span below:
<span style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Times; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span style="font-family: sans-serif; "><h1>This is title</h1></span></span>
If we paste the above example into right where the br element is in
<h1><br></h1>, WebKit produces:
<h1><span style="font-weight: normal; font-size: medium; "><h1>This is title</h1></span></h1>
Here, the span between two nested h1 is canceling the style of the outer h1
because the span is preserving the style of the container
from which contents were copied; i.e. immediately outside of
<h1>This is title</h1>.
This is horrible because neither the span nor the h1 add any semantic
or visual information to the page, and it is invalid markup
under HTML4.01, XHTML1.0, and HTML5.
The Two-Year Project to Remove Apple style spans
When I started working as an intern at Google in the summer of 2009,
this problem caught my attention and I decided to investigate the ways to fix it.
which implements inline style application commands such as
which are responsible for copy and paste respectively all heavily relied on
ReplaceSelectionCommand detected and treated the wrapping spans
generated by markup.cpp on copy very differently from other elements.
I soon realized that removing Apple style spans require the following 3 steps:
ApplyStyleCommandso that it doesn’t depend on Apple style spans
- Improve copy and paste code not to use Apple style spans
- Remove Apple style spans
Since I was an intern at the time and I had only a couple of weeks left,
I decided to focus on the step 1.
So I fixed various bugs in
ApplyStyleCommand and refactored the code:
- Bug 27556 - pushDownTextDecorationStyleAroundNode needs clean up
- Bug 27476 - execCommand(‘underline’/‘strikeThrough’) doesn’t work properly with multiple styles in text-decoration
- Bug 27325 - copyInheritableProperties and removeComputedInheritablePropertiesFrom should be deprecated
- Bug 20215 - execCommand can’t remove presentational tags (u, s, & strike)
- Bug 24333 - execCommand(‘underline’) can modify DOM outside of the contentEditable area
- Bug 27660 - createMarkup does not handle CSS properly
- Bug 27809 - REGRESSION(r46370-46426): /editing/style/remove-underline-from-stylesheet.html fails
- Bug 27749 - StyleChange::init needs clean up before fixing the bug 23892
- Bug 23892 - execCommand(‘Underline’) uses CSS even when styleWithCSS has been turned off
- Bug 28091 - ApplyStyleCommand removes presentational tags even when not necessary
- Bug 30784 - WebKit cannot remove nested bold tags
When I came back to Google as a full-time employee, a year later, I continued to fix and refactor this class:
- Bug 39989 - Inline elements with contenteditable - applying RichText crashes browser
- Bug 43437 - extractAndNegateTextDecorationStyle and maxRangeOffset in ApplyStyleCommand.cpp should be deleted
- Bug 43699 - Use getIdentifierValue to obtain direction and unicode-bidi properties in ApplyStyleCommand
- Bug 43465 - fontColorChangesComputedStyle, fontSizeChangesComputedStyle, and fontFaceChangesComputedStyle should be removed
- Bug 26871 - Can’t unbold text in div in font-weight span
- Bug 30836 - Creating a link when selecting multiple nodes creates multiple links
- Bug 44622 - implicitlyStyledElementShouldBeRemovedWhenApplyingStyle, removeHTMLFontStyle, and removeHTMLBidiEmbeddingStyle should be merged
- Bug 44560 - cannot remove text-decoration when style is added by u or s
- Bug 44458 - ApplyStyleCommand::applyInlineStyle needs cleanup
- Bug 25086 - Can’t unbold bolded list item when list is surrounded by
- Bug 45008 - applyInlineStyleToRange needs cleanup
- Bug 45026 - REGRESSION: applying new font size causes font-size outside selection to change
- Bug 45525 - REGRESSION(r67176): editing/selection/doubleclick-inline-first-last-contenteditable.html crashes
- Bug 28968 - execCommand(‘fantasize’) on certain selected html generates too many SPAN tags
- Bug 45632 - REGRESSION: In Gmail, a crash occurs at getDoubleValue() when applying a text color to a new line
- Bug 45616 - applyInlineStyleToNodeRange does not extend a run properly
- Bug 46205 - cleanup: removeInlineStyleFromElement and extractInlineStyleToPushDown should be merged
- Bug 45568 - WebKit nests font element when applying different font styles
- Remove conflicting styles (e.g. if we’re italicizing text, then remove all instances of font-style properties with values other than italic).
- For each inline runs, remove all styles that match the style being applied (e.g. if we’re italicizing text, then we remove all font-style properties, em, and i).
- Wrap each inline runs with appropriate element or a span with style appropriate attribute; or add appropriate properties to an existing element that wraps each run.
I’m quite proud of this algorithm myself since it produces very clean markup
at the end (current WebKit implementation has a bug in pushing down styles).
After I had made some progress in refactoring
I started cleaning up DOM serialization code in markup.cpp as well
which is responsible for generating two wrapping spans.
But there were a couple of obstacles I had to deal with:
- There are two conflicting
createMarkupfunctions one used for copy and another one used for innerHTML, and they shared code by means of calling functions instead of a class hierarchy. This made it hard to modify the interface of each function and do the necessary refactoring to avoid adding wrapping style spans.
- createMarkup used for copy was a 250-line long function that serialized range, determined the highest ancestor to serialize, and added wrapping spans. It made it extremely hard to see which variable or condition depends on what.
- Various functions in markup.cpp manipulated
CSSMutableStyleDeclarationbut the intentions of them and implications on paste code were not obvious.
To address points 1 and 2, I decided to do a massive refactoring of markup.cpp.
Since darin had already introduced MarkupAccumulator
(Darin always has the best idea for refactoring!) for the innerHTML version of createMarkup,
I decided to introduce
StylizedMarkupAccumulator that inherits from
for the copy version of
As usual, this resulted in an army of bugs:
- Bug 43227 - Group functions used in createMarkup (range version) into a class so they are easier to understand
- Bug 43405 - Extract a function that serializes nodes from the range version of createMarkup
- Bug 43834 - merge MarkupAccumulator and MarkupAccumulatorWrapper
- Bug 43936 - Group functions in markup.cpp into MarkupAccumulatorWrapper
- Bug 44229 - Range, EAnnotateForInterchange, and EAbsoluteURLs should be member variables of MarkupAccumulator
- Bug 44318 - style correction in markup.cpp
- Bug 44288 - MarkupAccumulator::appendStartMarkup should be broken down into pieces
- Bug 44831 - The logic to escape entities in appendEscapedContent and appendAttributeValue should be merged
- Bug 44854 - MarkupAccumulator should be broken down into two classes
- Bug 45449 - Extract the code to find special ancestors in createMarkup into a function
- Bug 45624 - Move functions of StyledMarkupAccumulator below that of MarkupAccumulator
- Bug 44833 - Each EntityMaskIn* needs explanation
- Bug 47749 - serializeNodesWithNamespaces should be in MarkupAccumulator
- Bug 47846 - elementCannotHaveEndTag should be a member function of MarkupAccumulator
After all these patches,
markup.cpp started looking really clean and nice
(Note that abarth extracted MarkupAccumulator.cpp
shortly before I finished all the refactoring).
StylizedMarkupAccumulator provided a perfect abstraction
for getting rid of wrapping spans,
and various refactoring made clear that this is feasible.
Removing Apple style spans
Now I was able tackle point 3, removing Apple style spans.
For me to get rid of
Apple-style-span, I had to fully understand how WebKit preserves styles
and how various parts of the editing component manipulate and interpret the style information.
Meanwhile, I had realized the fact various parts of editing component
CSSMutableStyleDeclaration is problematic
because of tricky properties like background-color and text-decoration
from my prior experience with
Even seemingly simple font-weight is hard to deal with
because it can take numeric values such as 700 and 400 or keywords such as bold and normal.
So I introduced a new layer of abstraction, so called
between the editing component and the CSS component to centralizes
all style manipulation code in one place:
- Bug 46335 - Add EditingStyle
- Bug 49155 - Remove the remaining editing-style functions from ApplyStyleCommand
- Bug 49938 - ApplyStyleCommand should take EditingStyle instead of CSSStyleDeclaration
- Bug 50641 - ApplyStyleCommand::applyRelativeFontStyleChange should take EditingStyle*
- Bug 53911 - Deploy EditingStyle in applyBlockStyle and applyInlineStyle
- Bug 54528 - Deploy EditingStyle more in ApplyStyleCommand and do some cleanup
- Bug 54944 - Deploy EditingStyle in removeInlineStyleFromElement and removeCSSStyle
- Bug 54933 - Make Editor::selectionComputedStyle return EditingStyle
- Bug 55025 - Refactor HTMLEquivalent into a hierachy of classes
- Bug 55207 - Move HTMLEquivalent and its subclasses to EditingStyle
- Bug 55338 - applyInlineStyleToPushDown and removeInlineStyleFromElement should take EditingStyle
- Bug 55349 - WebKit does not merge text decorations in the typing style and the selected element properly
- Bug 55950 - addInlineStyleIfNeeded should take EditingStyle
- Bug 55974 - Move StyleChange and other global functions from ApplyStyleCommand to EditingStyle
- Bug 61887 - ApplyStyleCommand shouldn’t call collapseTextDecorationProperties
I’m extremely happy about this on-going refactoring as it has been reducing the code duplication and caught many hidden bugs.
Now, it was about time. I had addressed all three points that blocked me from getting rid of wrapping style spans on copy. So I started my epic attempt to get rid of wrapping style spans in May, 2011. This was not an easy job because we use copy and paste code as a part of some other editing commands, and in fact, I spent almost an entire week just to create a prototype. Since I normally submit five or more patches a week, spending an entire week on one patch that can’t even be submitted for a review was very unusual. But it paid off at the end. I was able to come up with a patch that gets rid of wrapping spans and does not regress a single test:
- Bug 60988 - Wrap copied contents by one style span instead of two
- Bug 60914 - REGRESSION(r84311): WebKit copies too much styles when copying
- Bug 61466 - WebKit duplicates styles from css rules on copy and paste
- Bug 65833 - Remove redundant inline styles from the pasted contents more aggressively
- Bug 65824 - Repeated copy and paste result in nested font elements
- Bug 34564 - Copying can result in span around block elements on the clipboard
Now, recall my list of things to do in order to remove Apple style spans:
Improve ApplyStyleCommand so that it doesn’t depend on Apple style spans Improve copy and paste code not to use Apple style spans
- Remove Apple style spans
Yes, I was only left with step 3 when I landed the patch for 34564 this Wednesday. So I went ahead and finished off step 3 of this two-year project:
- Bug 66091 - Share code between isStyleSpanOrSpanWithOnlyStyleAttribute, isUnstyledStyleSpan, isSpanWithoutAttributesOrUnstyleStyleSpan and replaceWithSpanOrRemoveIfWithoutAttributes
- Bug 12248 - Apple-style-span class seems unnecessary
And there you go. WebKit revision 93001 that no longer produces Apple style spans. My (and perhaps your) dream has come true.
Of course, all of this could not happen without support from the following people and the entire WebKit community, whom I sincerely thank:
- Darin Adler
- Enrica Casucci
- Eric Seidel
- Julie Parent
- Justin Garcia
- Levi Weintraub
- Ojan Vafai
- Tony Chang