Position and Anchor Types
The Position
class is one of the most important classes in WebKit's editing component.
Position objects are used in editing, accessibility, hit testing,
and other components to represent a point in DOM.
To illustrate this idea, consider the following markup:
<div>This is <b>bolded</b></div>
When this markup is parsed by WebKit, a DOM tree like this will be generated (partial, of course, because HTML5 mandates that the document and the body nodes should automatically be generated):
div
#text("This is ")
b
#text("bolded")
Here, #text(X)
represents a text node with text X
.
On many user agents (e.g. web browsers),
users can place and manipulate the caret (a.k.a. insertion point)
and selection around text by mouse or keyboard.
In WebKit, Position
objects represent end points of caret and selection.
We have the VisibleSelection
class
that can represent an arbitrary caret or selection using two Position objects
and the FrameSelection
class
that controls the selection of each frame.
For example, if you select the text This
in #text("This is")
,
then the VisibleSelection
object that represents this selection has
one Position
object representing a point before T
and another Position
object representing a point after s
.
(containerNode, offset) Pair #
Now, how does one represent a point in DOM?
One way to do this is by using (containerNode, offset) pair as does
the Range interface in DOM.
In this model, a position (containerNode=X, offset=N) is between the children
at offsets N and N+1 of the DOM node X.
e.g. (div, 0) represents the position immediately before #text("This is")
and (div, 1) represents the position immediately after #text("This is")
and immediately before the b element.
Editing Offsets #
While (containerNode, offset) pair representation is the one exposed to the Web
via the Selection
interface,
it poses a bit of challenge when used in editing.
Consider the following DOM now:
div
#text("Here is an image:")
br
br
img
And suppose the caret is after the two br elements and before the img element.
This position can be represented as (div, 3).
Now further suppose that the user decides to remove the two br elements between
#text("Here is an image:")
and the img element to get:
div
#text("Here is an image:")
img
Here, the position (div, 3) is no longer valid because the div element has only 2 children.
To this fix problem, DOM level 2 specifies
how Range should be updated upon DOM mutation,
resulting in
In WebKit, we tried to work-around this issue by introducing editing offsets. In the context of editing offsets, we allow offset 1 with nodes that do not have any children (more precisely, editing ignores content) to represent the positions after the nodes. It also treat offset 0 to represent the positions before the nodes. For example, in the above example, we can use the position (img, 0) to mean the position before the img element and (img, 1) to mean the position after the img element. Under this representation, the positions will remain valid under insertions and deletion of siblings of the img element.
One subtle change we had to make to (containerNode, offset) pair representation is that
containerNode was no longer a container node because depending on the type of the node,
this position can reside before or after the node.
Thus we started calling it an anchor node instead and the Position
became a pair (anchorNode, offset).
Anchor Types #
Unfortunately, the notion of editing offsets really confused the rest of WebKit
and some code in editing component itself because the meaning of offsets changed depending on the node.
For example, (X, 0) could mean either that the position is before the node X or that
the position is first position inside X depending on
whether X's content is ignored by editing or not (i.e. br
, img
, etc...).
So WebKit gurus eseidel
and darin
amongst others came up with the idea of anchor type.
Initially, the anchor type of a position was an enum with 3 values:
PositionIsOffsetInAnchor
, PositionIsBeforeAnchor
, and PositionIsAfterAnchor
.
PositionIsOffsetInAnchor
corresponds to a (node, offset) pair that DOM's Range
interface implements
while PositionIsBeforeInAnchor
and PositionIsAfterAnchor
correspond to
before-node and after-node positions introduced by editing offsets.
So the the position became a 3-tuple (anchorNode, anchorType, offset), and we've added new constructors for Position that explicitly takes an anchor type, and made other constructors auto-compute the anchor type for legacy editing-offset-based positions.
Deprecated node() #
This is about the time when I joined the WebKit team
and realized something strange.
I found that a lot of editing code called Position::node()
,
which returns the anchor node of a position, without checking the anchor type of the position.
Consider the markup <div>a<p>b</p></div>
and its DOM tree:
div
#text("a")
p
#text("b")
Now, if we have a position between #text("a")
and p,
we can represent it as (p, PositionIsBeforeAnchor, 0) where the last value is ignored.
And take a look at the following WebKit code that
I fixed:
Node* enclosingBlockNode = enclosingBlock(m_selection.extent().node());
Here, m_selection.extent()
is a Position
and enclosingBlock
is a function
that returns the closest block ancestor of the specified node
(i.e. enclosingBlock
walks the DOM tree upwards from the given node
and returns the first block node it encounters).
If m_selection.extent()
was (div, PositionIsOffsetAnchor, 0),
then enclosingBlockNode
will be the div
element,
and it'll be the p
element if m_selection.extent()
was (#text("b"), PositionIsBeforeAnchor, 0).
A problem surfaces when we let m_selection.extent()
be (p, PositionIsBeforeAnchor, 0)
In this case, the position resides between #text("a") and the p
element
even though the anchor node is the p
element.
Because node()
returns the p
element, enclosingBlock
will return the p
element itself.
This is clearly wrong because the position resides outside the p
element.
This happens because the code assumed that a position always resides inside its anchor node,
which isn't the case for PositionIsBeforeAnchor
and PositionIsAfterAnchor
anchor types.
Here's another common mistake:
insertNodeBefore(newNode, insertionPos.node())
insertionPos
is a position and newNode
is some node to be inserted.
insertNodeBefore
is a function that inserts the first node
into the parent of the second node immediately before the second node.
This code makes a lot of sense if the anchor type of insertionPos
was PositionIsBeforeAnchor
because then we'll be inserting newNode
right before the insertionPos
.
Unfortunately, the code makes little sense when the anchor type of insertionPos
was PositionIsAfterAnchor
,
in which case newNode
is inserted one element ahead of the insertionPos
.
It makes even less sense when the anchor type is PositionIsOffsetInAnchor
, in which case,
newNode
is inserted before the container node of insertionPos.
After realizing this very fundamental issue present in hundreds of lines of code, I declared Position::node obsolete and added deprecated in its name.
More Anchor Types for Efficiency #
During the 2011 WebKit Contributors' Meeting, we developed further discussion on how to refactor and improve Position and related classes. One of the ideas that came up during the meeting was to restrict the use of offsets to text nodes to allow fast comparison between positions. To see where we're coming from, consider two positions (p, PositionIsBeforeAnchor, 0) and (div, PositionIsOffsetInAnchor, 1) in the following DOM tree:
div
#text("a")
p
#text("b")
They represent the same position between #text("a")
and the p
element.
However, bit-wise comparison of the position fails because their internal representation is different.
To make things worse, WebKit stores the children of a node as a liked list.
This would mean that to compare the above two positions, we must be traversing a linked list,
yielding
To mitigate this problem, we decided to introduce two more anchor types:
PositionIsBeforeChildren
and PositionIsAfterChildren
,
and restrict the use of PositionIsOffsetInAnchor
to only text nodes.
We'll replace (X, PositionIsOffsetInAnchor, N) by (X.childNodes[N], PositionIsBeforeAnchor, N)
when N is less than the number of children and by (X, PositionIsAfterChildren, N)
when N is equal to the number of children when X is a non-text node;
or by (X.childNodes[N-1], PositionIsAfterAnchor, N) when N > 0
and (X, PositionIsBeforeChildren, 0) when N = 0.
To illustrate how this works, let me enumerate all possible positions inside the div
element:
- (div, PositionIsBeforeChildren) = (#text(“a”), PositionIsBeforeAnchor)
- (#text(“a”), PositionIsOffsetInAnchor, 0)
- (#text(“a”), PositionIsOffsetInAnchor, 1)
- (#text(“a”), PositionIsAfterAnchor) = (p, PositionIsBeforeAnchor)
- (p, PositionIsBeforeChilden) = (#text(“b”), PositionIsBeforeAnchor)
- (#text(“b”), PositionIsOffsetInAnchor, 0)
- (#text(“b”), PositionIsOffsetInAnchor, 1)
- (#text(“b”), PositionIsAfterAnchor) = (p, PositionIsAfterChildren)
- (p, PositionIsAfterAnchor) = (div, PositionIsAfterChildren)
Note I've eliminated legacy editing positions and PositionIsOffsetInAnchor
positions for non-text nodes.
Because each position has at most two possible representations
and the anchor node of one representation is always the first child
or the last child of the anchor node of another representation,
comparison between these positions can be done in
Furthermore, the set of positions that use only
PositionIsOffsetInAnchor
, PositionIsBeforeChildren
, and PositionIsAfterAnchor
or only PositionIsOffsetInAnchor
, PositionIsAfterChildren
, and PositionIsBeforeAnchor
will uniquely represent each position in the DOM, making them isomorphic to the set of positions defined
in the DOM level 2 Traversal and Range specification.
Summary #
To summarize,
- WebKit uses
Position
class to store selection end points. - There are 5 anchor types:
PositionIsOffsetInAnchor
,PositionIsBeforeAnchor
,PositionIsAfterAnchor
,PositionIsBeforeChildren
, andPositionIsAfterChildren
- In addition, there's legacy editing positions that use the editing offsets. In editing offsets, 0 means before-anchor and 1 means after-anchor.
- Calling
deprecatedNode()
oranchorNode()
is usually wrong for some types of positions. - We don't want (node, offset, PositionIsOffsetInAnchor) pair for non-text nodes.