Position and Anchor Types

The Position class is one of the most important classes in WebKit’s editing component. Position objects are used in editing, accessibility, hit testing, and other components to represent a point in DOM. To illustrate this idea, consider the following markup:

<div>This is <b>bolded</b></div>

When this markup is parsed by WebKit, a DOM tree like this will be generated (partial, of course, because HTML5 mandates that the document and the body nodes should automatically be generated):

  #text("This is ")

Here, #text(X) represents a text node with text X. On many user agents (e.g. web browsers), users can place and manipulate the caret (a.k.a. insertion point) and selection around text by mouse or keyboard.

In WebKit, Position objects represent end points of caret and selection. We have the VisibleSelection class that can represent an arbitrary caret or selection using two Position objects and the FrameSelection class that controls the selection of each frame. For example, if you select the text This in #text("This is"), then the VisibleSelection object that represents this selection has one Position object representing a point before T and another Position object representing a point after s.

(containerNode, offset) Pair

Now, how does one represent a point in DOM? One way to do this is by using (containerNode, offset) pair as does the Range interface in DOM. In this model, a position (containerNode=X, offset=N) is between the children at offsets N and N+1 of the DOM node X. e.g. (div, 0) represents the position immediately before #text("This is") and (div, 1) represents the position immediately after #text("This is") and immediately before the b element.

Editing Offsets

While (containerNode, offset) pair representation is the one exposed to the Web via the Selection interface, it poses a bit of challenge when used in editing. Consider the following DOM now:

  #text("Here is an image:")

And suppose the caret is after the two br elements and before the img element. This position can be represented as (div, 3). Now further suppose that the user decides to remove the two br elements between #text("Here is an image:") and the img element to get: div #text("Here is an image:") img

Here, the position (div, 3) is no longer valid because the div element has only 2 children. To this fix problem, DOM level 2 specifies how Range should be updated upon DOM mutation, resulting in O(m) performance where m is the number of Range objects attached in the same document.

In WebKit, we tried to work-around this issue by introducing editing offsets. In the context of editing offsets, we allow offset 1 with nodes that do not have any children (more precisely, editing ignores content) to represent the positions after the nodes. It also treat offset 0 to represent the positions before the nodes. For example, in the above example, we can use the position (img, 0) to mean the position before the img element and (img, 1) to mean the position after the img element. Under this representation, the positions will remain valid under insertions and deletion of siblings of the img element.

One subtle change we had to make to (containerNode, offset) pair representation is that containerNode was no longer a container node because depending on the type of the node, this position can reside before or after the node. Thus we started calling it an anchor node instead and the Position became a pair (anchorNode, offset).

Anchor Types

Unfortunately, the notion of editing offsets really confused the rest of WebKit and some code in editing component itself because the meaning of offsets changed depending on the node. For example, (X, 0) could mean either that the position is before the node X or that the position is first position inside X depending on whether X’s content is ignored by editing or not (i.e. br, img, etc…).

So WebKit gurus eseidel and darin amongst others came up with the idea of anchor type. Initially, the anchor type of a position was an enum with 3 values: PositionIsOffsetInAnchor, PositionIsBeforeAnchor, and PositionIsAfterAnchor. PositionIsOffsetInAnchor corresponds to a (node, offset) pair that DOM’s Range interface implements while PositionIsBeforeInAnchor and PositionIsAfterAnchor correspond to before-node and after-node positions introduced by editing offsets.

So the the position became a 3-tuple (anchorNode, anchorType, offset), and we’ve added new constructors for Position that explicitly takes an anchor type, and made other constructors auto-compute the anchor type for legacy editing-offset-based positions.

Deprecated node()

This is about the time when I joined the WebKit team and realized something strange. I found that a lot of editing code called Position::node(), which returns the anchor node of a position, without checking the anchor type of the position. Consider the markup <div>a<p>b</p></div> and its DOM tree:


Now, if we have a position between #text("a") and p, we can represent it as (p, PositionIsBeforeAnchor, 0) where the last value is ignored. And take a look at the following WebKit code that I fixed:

Node* enclosingBlockNode = enclosingBlock(m_selection.extent().node());

Here, m_selection.extent() is a Position and enclosingBlock is a function that returns the closest block ancestor of the specified node (i.e. enclosingBlock walks the DOM tree upwards from the given node and returns the first block node it encounters). If m_selection.extent() was (div, PositionIsOffsetAnchor, 0), then enclosingBlockNode will be the div element, and it’ll be the p element if m_selection.extent() was (#text(“b”), PositionIsBeforeAnchor, 0).

A problem surfaces when we let m_selection.extent() be (p, PositionIsBeforeAnchor, 0) In this case, the position resides between #text(“a”) and the p element even though the anchor node is the p element. Because node() returns the p element, enclosingBlock will return the p element itself. This is clearly wrong because the position resides outside the p element. This happens because the code assumed that a position always resides inside its anchor node, which isn’t the case for PositionIsBeforeAnchor and PositionIsAfterAnchor anchor types.

Here’s another common mistake:

insertNodeBefore(newNode, insertionPos.node())

insertionPos is a position and newNode is some node to be inserted. insertNodeBefore is a function that inserts the first node into the parent of the second node immediately before the second node. This code makes a lot of sense if the anchor type of insertionPos was PositionIsBeforeAnchor because then we’ll be inserting newNode right before the insertionPos. Unfortunately, the code makes little sense when the anchor type of insertionPos was PositionIsAfterAnchor, in which case newNode is inserted one element ahead of the insertionPos. It makes even less sense when the anchor type is PositionIsOffsetInAnchor, in which case, newNode is inserted before the container node of insertionPos.

After realizing this very fundamental issue present in hundreds of lines of code, I declared Position::node obsolete and added deprecated in its name.

More Anchor Types for Efficiency

During the 2011 WebKit Contributors' Meeting, we developed further discussion on how to refactor and improve Position and related classes. One of the ideas that came up during the meeting was to restrict the use of offsets to text nodes to allow fast comparison between positions. To see where we’re coming from, consider two positions (p, PositionIsBeforeAnchor, 0) and (div, PositionIsOffsetInAnchor, 1) in the following DOM tree:


They represent the same position between #text("a") and the p element. However, bit-wise comparison of the position fails because their internal representation is different. To make things worse, WebKit stores the children of a node as a liked list. This would mean that to compare the above two positions, we must be traversing a linked list, yielding O(n) performance where n is the number of nodes in the DOM. This had also caused problems in transitioning from legacy editing positions to new explicitly typed positions because a lot of position comparison didn’t take these differences into account.

To mitigate this problem, we decided to introduce two more anchor types: PositionIsBeforeChildren and PositionIsAfterChildren, and restrict the use of PositionIsOffsetInAnchor to only text nodes. We’ll replace (X, PositionIsOffsetInAnchor, N) by (X.childNodes[N], PositionIsBeforeAnchor, N) when N is less than the number of children and by (X, PositionIsAfterChildren, N) when N is equal to the number of children when X is a non-text node; or by (X.childNodes[N-1], PositionIsAfterAnchor, N) when N > 0 and (X, PositionIsBeforeChildren, 0) when N = 0.

To illustrate how this works, let me enumerate all possible positions inside the div element:

  1. (div, PositionIsBeforeChildren) = (#text(“a”), PositionIsBeforeAnchor)
  2. (#text(“a”), PositionIsOffsetInAnchor, 0)
  3. (#text(“a”), PositionIsOffsetInAnchor, 1)
  4. (#text(“a”), PositionIsAfterAnchor) = (p, PositionIsBeforeAnchor)
  5. (p, PositionIsBeforeChilden) = (#text(“b”), PositionIsBeforeAnchor)
  6. (#text(“b”), PositionIsOffsetInAnchor, 0)
  7. (#text(“b”), PositionIsOffsetInAnchor, 1)
  8. (#text(“b”), PositionIsAfterAnchor) = (p, PositionIsAfterChildren)
  9. (p, PositionIsAfterAnchor) = (div, PositionIsAfterChildren)

Note I’ve eliminated legacy editing positions and PositionIsOffsetInAnchor positions for non-text nodes. Because each position has at most two possible representations and the anchor node of one representation is always the first child or the last child of the anchor node of another representation, comparison between these positions can be done in O(1).

Furthermore, the set of positions that use only PositionIsOffsetInAnchorPositionIsBeforeChildren, and PositionIsAfterAnchor or only PositionIsOffsetInAnchorPositionIsAfterChildren, and PositionIsBeforeAnchor will uniquely represent each position in the DOM, making them isomorphic to the set of positions defined in the DOM level 2 Traversal and Range specification.


To summarize,