残差连接
output = Layer(input) + input
HC:超连接
Layer input = H_pre × x (n streams → 1 input)
Layer output = Attention(input)
Write back = H_post × output (1 output → n streams)
Mix streams = H_res × x (mix the n streams together)
New x = Mix streams + Write back
H_res是一个nxn矩阵,
H_res = [0.6, 0.2, 0.2]
[0.3, 0.5, 0.2]
[0.1, 0.3, 0.6]
New stream 0 = 0.6×stream₀ + 0.2×stream₁ + 0.2×stream₂
New stream 1 = 0.3×stream₀ + 0.5×stream₁ + 0.2×stream₂
New stream 2 = 0.1×stream₀ + 0.3×stream₁ + 0.6×stream₂
mHC:流行约束 超连接
所有值非负、每行和为1、每列和为1
Sinkhorn-Knopp 算法:一种迭代归一化技术
def make_doubly_stochastic(matrix, iterations=20):
# Step 1: Make all values positive
M = exp(matrix) # exponentiation ensures positivity
# Step 2: Alternate row and column normalization
for i in range(iterations):
# Normalize rows to sum to 1
M = M / M.sum(axis=1, keepdims=True)
# Normalize columns to sum to 1
M = M / M.sum(axis=0, keepdims=True)
return M # Now doubly stochastic!