#AI#deepseek

残差连接

output = Layer(input) + input

HC:超连接

Layer input = H_pre × x (n streams → 1 input)  
Layer output = Attention(input)  
Write back = H_post × output (1 output → n streams)  

Mix streams = H_res × x (mix the n streams together)  

New x = Mix streams + Write back

H_res是一个nxn矩阵,

H_res = [0.6, 0.2, 0.2]  
        [0.3, 0.5, 0.2]   
        [0.1, 0.3, 0.6]
New stream 0 = 0.6×stream₀ + 0.2×stream₁ + 0.2×stream₂  
New stream 1 = 0.3×stream₀ + 0.5×stream₁ + 0.2×stream₂  
New stream 2 = 0.1×stream₀ + 0.3×stream₁ + 0.6×stream₂

mHC:流行约束 超连接

所有值非负、每行和为1、每列和为1

Sinkhorn-Knopp 算法:一种迭代归一化技术

def make_doubly_stochastic(matrix, iterations=20):  
    # Step 1: Make all values positive  
    M = exp(matrix)  # exponentiation ensures positivity  
      
    # Step 2: Alternate row and column normalization  
    for i in range(iterations):  
        # Normalize rows to sum to 1  
        M = M / M.sum(axis=1, keepdims=True)  
          
        # Normalize columns to sum to 1    
        M = M / M.sum(axis=0, keepdims=True)  
      
    return M  # Now doubly stochastic!

参考