Rotary Position Embedding, or RoPE, is a type of position embedding which encodes absolute positional information with rotation matrix and naturally incorporates explicit relative position dependency in self-attention formulation, as described in the paper RoFormer: Enhanced Transformer with Rotary Position Embedding.
Rotary Position Embedding is defined as:
For each input query
and key
, do rotary position embedding on dimension
Position index is start from start_pos
, which means value of start_pos
to start_pos + seqlen
. For example, rotary position embedding of query
is perform as:
for b in range(batch):
for s in range(seqlen):
for nh in range(num_heads):
pivot = rotary_dim if rotary_dim else query.shape[-1]
offset = start_pos + s - pad_len[b]
rotated_query[b, s, nh, :pivot] =
f(query[b, s, nh, :pivot], offset)
How many elements in dimension 0
, which means all elements should be rotary. Otherwise only rotary
Hyperameter
Bypass rotating key
for compatibility.
Max position embedding index for scaling overflow position embeddings. Only effected when scaling_type != ''
.
Rotate embeddings scaling type when position index is larger than max_position_embeddings
.
- '':
posision = position
- 'linear':
posision = position / scaling_factor
- 'dynamic':
theta = theta * ((scaling_factor * (position + seqlen) / max_position_embeddings) - (scaling_factor - 1)) ** (rotary_dim / (rotary_dim - 2))
Rotate embeddings scaling factor for scaling_type != ''
.
Input query tensor.
Shape:
Input key tensor.
Shape:
Start position in a sequence.
Padding length of each sequence. position of each batch b
should start from start_pos - pad_len[b]
.
Shape:
Query tensor after rotary position embedding.
Shape:
Key tensor after rotary position embedding .
Shape: