Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
pcunwa/BS-EXP-SiameseRoformer · SiameseNorm Vs KEEL
[go: Go Back, main page]

SiameseNorm Vs KEEL

#2
by NilanE - opened

Have you had a chance to test https://arxiv.org/pdf/2601.19895?
I've found it to be stable in GAN training (a good stress-test), especially when paired with Gated Attention (https://arxiv.org/pdf/2505.06708 and Qwen3-next).

Owner

I haven’t tried it yet, but that sounds interesting.
Since this model has a gating mechanism for each head, it might work well with it.

Sign up or log in to comment