Deprecated : The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
LLM - a paisleypark Collection
paisleypark 's Collections Order Matters in the Presence of Dataset Imbalance for Multilingual
Learning Paper
• 2312.06134
• Published Dec 11, 2023 • 3
Efficient Monotonic Multihead Attention Paper
• 2312.04515
• Published Dec 7, 2023 • 8
Contrastive Decoding Improves Reasoning in Large Language Models Paper
• 2309.09117
• Published Sep 17, 2023 • 40
Exploring Format Consistency for Instruction Tuning Paper
• 2307.15504
• Published Jul 28, 2023 • 8
Learning Universal Predictors Paper
• 2401.14953
• Published Jan 26, 2024 • 22
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty Paper
• 2401.15077
• Published Jan 26, 2024 • 20
SliceGPT: Compress Large Language Models by Deleting Rows and Columns Paper
• 2401.15024
• Published Jan 26, 2024 • 73
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other
Modalities Paper
• 2401.14405
• Published Jan 25, 2024 • 13
Deconstructing Denoising Diffusion Models for Self-Supervised Learning Paper
• 2401.14404
• Published Jan 25, 2024 • 18
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data Paper
• 2401.10891
• Published Jan 19, 2024 • 62
Time is Encoded in the Weights of Finetuned Language Models Paper
• 2312.13401
• Published Dec 20, 2023 • 20
Unsupervised Universal Image Segmentation Paper
• 2312.17243
• Published Dec 28, 2023 • 20
Reasons to Reject? Aligning Language Models with Judgments Paper
• 2312.14591
• Published Dec 22, 2023 • 18
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis Paper
• 2312.13314
• Published Dec 20, 2023 • 8
Cached Transformers: Improving Transformers with Differentiable Memory
Cache Paper
• 2312.12742
• Published Dec 20, 2023 • 13
In-Context Learning Creates Task Vectors Paper
• 2310.15916
• Published Oct 24, 2023 • 43
Controlled Decoding from Language Models Paper
• 2310.17022
• Published Oct 25, 2023 • 14
CapsFusion: Rethinking Image-Text Data at Scale Paper
• 2310.20550
• Published Oct 31, 2023 • 27
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs Paper
• 2311.02262
• Published Nov 3, 2023 • 14
Memory Augmented Language Models through Mixture of Word Experts Paper
• 2311.10768
• Published Nov 15, 2023 • 19
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial
Understanding Paper
• 2310.15308
• Published Oct 23, 2023 • 23
An Image is Worth Multiple Words: Learning Object Level Concepts using
Multi-Concept Prompt Learning Paper
• 2310.12274
• Published Oct 18, 2023 • 13
Language Modeling Is Compression Paper
• 2309.10668
• Published Sep 19, 2023 • 85
Finite Scalar Quantization: VQ-VAE Made Simple Paper
• 2309.15505
• Published Sep 27, 2023 • 24
Vision Transformers Need Registers Paper
• 2309.16588
• Published Sep 28, 2023 • 86
Paper
• 2309.03179
• Published Sep 6, 2023 • 31
Gated recurrent neural networks discover attention Paper
• 2309.01775
• Published Sep 4, 2023 • 10
One Wide Feedforward is All You Need Paper
• 2309.01826
• Published Sep 4, 2023 • 34
Semantic-SAM: Segment and Recognize Anything at Any Granularity Paper
• 2307.04767
• Published Jul 10, 2023 • 23
Scaling MLPs: A Tale of Inductive Bias Paper
• 2306.13575
• Published Jun 23, 2023 • 17
MSViT: Dynamic Mixed-Scale Tokenization for Vision Transformers Paper
• 2307.02321
• Published Jul 5, 2023 • 7
CRAG -- Comprehensive RAG Benchmark Paper
• 2406.04744
• Published Jun 7, 2024 • 46