Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -49,7 +49,7 @@ Of note, I did not include any prior merges in this one; as I was noticing that | |
| 49 | 
             
            I've been asked what this is.  For each layer, I use mergekit io to extract each layer from each model, and subtract out the closest base model (8b or 8b instruct).
         | 
| 50 |  | 
| 51 | 
             
            * Recursive Pairwise Disjoint: Using this information I build a stack of layer deltas.  I'm a little compute limited, so I treat them in pairs.  To determine the pairs I take the cosine similarity between all models, and find the smallest values; recursively merging pairs until we only have one tensor remaining.
         | 
| 52 | 
            -
            * Normalized: I take and divide each layer by it's norm, and then scale back up by multiplying the result by a midpoint from the norms of the tensors.
         | 
| 53 | 
             
            * Denoised Fourier Interpolation: I first treat the tensor to a 2d fourier transform; then merge the tensors using SLERP or addition; then zero out the weights below a threshold percentage (a somewhat high 2%, but remains coherent on all the positions I tested, if a bit drier and sloppier as you go up).
         | 
| 54 |  | 
| 55 | 
             
            ### Format
         | 
|  | |
| 49 | 
             
            I've been asked what this is.  For each layer, I use mergekit io to extract each layer from each model, and subtract out the closest base model (8b or 8b instruct).
         | 
| 50 |  | 
| 51 | 
             
            * Recursive Pairwise Disjoint: Using this information I build a stack of layer deltas.  I'm a little compute limited, so I treat them in pairs.  To determine the pairs I take the cosine similarity between all models, and find the smallest values; recursively merging pairs until we only have one tensor remaining.
         | 
| 52 | 
            +
            * Normalized: I take and divide each layer by it's norm before the transform, and then scale back up by multiplying the result by a midpoint from the norms of the tensors after the inverse.  It's commutative, so it's more efficient to do it pre-complex.
         | 
| 53 | 
             
            * Denoised Fourier Interpolation: I first treat the tensor to a 2d fourier transform; then merge the tensors using SLERP or addition; then zero out the weights below a threshold percentage (a somewhat high 2%, but remains coherent on all the positions I tested, if a bit drier and sloppier as you go up).
         | 
| 54 |  | 
| 55 | 
             
            ### Format
         | 
