Apple style span is gone

This week, I committed WebKit changes r92823 and r93001. They’re perhaps the most important changesets I’ve ever committed to the WebKit codebase because these changesets made WebKit no longer produce wrapping style spans on copy and paste and class="Apple-style-span" anymore. In fact, these are two changes I’ve always wanted to make ever since I started working on the WebKit’s editing component in the summer of 2009.

Introduction to Apple style spans

An Apple style span is a HTML span element with the class Apple-style-span. It is created whenever WebKit applies style on text by CSS. For example, document.execCommand('HiliteColor', false, 'blue'); may produce:

<span class="Apple-style-span" style="background-color: rgb(0, 0, 255);">hello world</span>

The initial intent of this was so that WebKit can avoid removing or modifying elements created by the authors and meant to stay by differentiating spans added by WebKit itself and those created by the authors.

We also use an Apple style span to wrap the copied contents to preserve the style of the copied content. If you copy hello world on this page, for example, WebKit puts the following markup into the pasteboard on Mac:

<meta charset='utf-8'><span style="color: rgb(81, 96, 100);font-family: 'Open Sans', Helvetica, Meiryo, sans-serif; font-size: 16px; font-style: normal;font-variant: normal; font-weight: 300; letter-spacing: normal;line-height: 27px; orphans: 2;text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: 2;word-spacing: 0px; -webkit-text-decorations-in-effect: none;-webkit-text-size-adjust: auto;-webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); ">hello world</span>

Problems with Apple style spans

However, avoiding the modification of spans not created by WebKit turned out to be ineffective at best because the editing component had to add and remove so many other elements and WebKit also had to work with elements generated by other browsers and CMS editors. Also, avoiding the removal of spans without class="Apple-style-span" caused the markup to get progressively verbose over time because sometimes we had to cancel the style added by those elements e.g. (<code><b><span style="font-weight: normal;">unbolded text</span></b></code>). This was particularly apparent on mail clients that used WebKit as the editor such as Apple’s Mail or Gmail (if the user happens to use a WebKit-based browser). In some case, an e-email consisting of three lines of text consumed 3MB in HTML because of nested spans created by WebKit and other mail clients.

An Apple style span that wraps the copied contents can get far worse if the copied contents include block nodes. Consider the following markup which annotates This is title to be a level-1 header:

<h1>This is title</h1>

When This is title is copied, WebKit puts the following markup into the pasteboard:

<meta charset='utf-8'><span style="color: rgb(0, 0, 0); font-family: Times; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><h1>This is title</h1></span>

Notice that the h1 is wrapped in a span! In addition, WebKit used to wrap contents in two spans to retain the document’s style separately prior to r86983. Here, font-family: sans-serif was set on the body element and therefore stored in a separate span below:

<span style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Times; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span style="font-family: sans-serif; "><h1>This is title</h1></span></span>

If we paste the above example into right where the br element is in <h1><br></h1>, WebKit produces:

<h1><span style="font-weight: normal; font-size: medium; "><h1>This is title</h1></span></h1>

Here, the span between two nested h1 is canceling the style of the outer h1 because the span is preserving the style of the container from which contents were copied; i.e. immediately outside of <h1>This is title</h1>. This is horrible because neither the span nor the h1 add any semantic or visual information to the page, and it is invalid markup under HTML4.01, XHTML1.0, and HTML5.

The Two-Year Project to Remove Apple style spans

When I started working as an intern at Google in the summer of 2009, this problem caught my attention and I decided to investigate the ways to fix it. However, ApplyStyleCommand which implements inline style application commands such as execCommand('bold') and execCommand('italitc'), and markup.cpp and ReplaceSelectionCommand which are responsible for copy and paste respectively all heavily relied on the classname Apple-style-span. In particular, ReplaceSelectionCommand detected and treated the wrapping spans generated by markup.cpp on copy very differently from other elements. I soon realized that removing Apple style spans require the following 3 steps:

Since I was an intern at the time and I had only a couple of weeks left, I decided to focus on the step 1. So I fixed various bugs in ApplyStyleCommand and refactored the code:

When I came back to Google as a full-time employee, a year later, I continued to fix and refactor this class:

As a result of all these improvements, I have devised a a style application algorithm which is now partially adopted by Aryeh’s editing spec. It’s a three-phase algorithm described as below:

I’m quite proud of this algorithm myself since it produces very clean markup at the end (current WebKit implementation has a bug in pushing down styles). After I had made some progress in refactoring ApplyStyleCommand, I started cleaning up DOM serialization code in markup.cpp as well which is responsible for generating two wrapping spans. But there were a couple of obstacles I had to deal with:

  1. There are two conflicting createMarkup functions one used for copy and another one used for innerHTML, and they shared code by means of calling functions instead of a class hierarchy. This made it hard to modify the interface of each function and do the necessary refactoring to avoid adding wrapping style spans.
  2. createMarkup used for copy was a 250-line long function that serialized range, determined the highest ancestor to serialize, and added wrapping spans. It made it extremely hard to see which variable or condition depends on what.
  3. Various functions in markup.cpp manipulated CSSMutableStyleDeclaration but the intentions of them and implications on paste code were not obvious.

To address points 1 and 2, I decided to do a massive refactoring of markup.cpp. Since darin had already introduced MarkupAccumulator (Darin always has the best idea for refactoring!) for the innerHTML version of createMarkup, I decided to introduce StylizedMarkupAccumulator that inherits from MarkupAccumulator for the copy version of createMarkup. As usual, this resulted in an army of bugs:

After all these patches, markup.cpp started looking really clean and nice (Note that abarth extracted MarkupAccumulator.cpp shortly before I finished all the refactoring). In fact, StylizedMarkupAccumulator provided a perfect abstraction for getting rid of wrapping spans, and various refactoring made clear that this is feasible.

Removing Apple style spans

Now I was able tackle point 3, removing Apple style spans. For me to get rid of Apple-style-span, I had to fully understand how WebKit preserves styles and how various parts of the editing component manipulate and interpret the style information. Meanwhile, I had realized the fact various parts of editing component directly manipulate CSSMutableStyleDeclaration is problematic because of tricky properties like background-color and text-decoration from my prior experience with ApplyStyleCommand. Even seemingly simple font-weight is hard to deal with because it can take numeric values such as 700 and 400 or keywords such as bold and normal. So I introduced a new layer of abstraction, so called EditingStyle, between the editing component and the CSS component to centralizes all style manipulation code in one place:

I’m extremely happy about this on-going refactoring as it has been reducing the code duplication and caught many hidden bugs.

Now, it was about time. I had addressed all three points that blocked me from getting rid of wrapping style spans on copy. So I started my epic attempt to get rid of wrapping style spans in May, 2011. This was not an easy job because we use copy and paste code as a part of some other editing commands, and in fact, I spent almost an entire week just to create a prototype. Since I normally submit five or more patches a week, spending an entire week on one patch that can’t even be submitted for a review was very unusual. But it paid off at the end. I was able to come up with a patch that gets rid of wrapping spans and does not regress a single test:

Now, recall my list of things to do in order to remove Apple style spans:

Yes, I was only left with step 3 when I landed the patch for 34564 this Wednesday. So I went ahead and finished off step 3 of this two-year project:

And there you go. WebKit revision 93001 that no longer produces Apple style spans. My (and perhaps your) dream has come true.


Of course, all of this could not happen without support from the following people and the entire WebKit community, whom I sincerely thank: