Deprecated : The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
cv - a zzfive Collection
LocalMamba: Visual State Space Model with Windowed Selective Scan Paper
• 2403.09338
• Published Mar 14, 2024 • 8
GiT: Towards Generalist Vision Transformer through Universal Language
Interface Paper
• 2403.09394
• Published Mar 14, 2024 • 26
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Paper
• 2402.19479
• Published Feb 29, 2024 • 35
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection Paper
• 2405.10300
• Published May 16, 2024 • 30
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything
Model Paper
• 2406.20076
• Published Jun 28, 2024 • 10
SVGCraft: Beyond Single Object Text-to-SVG Synthesis with Comprehensive
Canvas Layout Paper
• 2404.00412
• Published Mar 30, 2024 • 2
LKCell: Efficient Cell Nuclei Instance Segmentation with Large
Convolution Kernels Paper
• 2407.18054
• Published Jul 25, 2024 • 12
Paper
• 2407.21017
• Published Jul 30, 2024 • 24
SAM 2: Segment Anything in Images and Videos Paper
• 2408.00714
• Published Aug 1, 2024 • 122
NeuFlow v2: High-Efficiency Optical Flow Estimation on Edge Devices Paper
• 2408.10161
• Published Aug 19, 2024 • 15
Sapiens: Foundation for Human Vision Models Paper
• 2408.12569
• Published Aug 22, 2024 • 94
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world
Videos Paper
• 2409.02095
• Published Sep 3, 2024 • 37
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper
• 2409.01704
• Published Sep 3, 2024 • 83
Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary
Detection Paper
• 2409.08513
• Published Sep 13, 2024 • 14
OmniGen: Unified Image Generation Paper
• 2409.11340
• Published Sep 17, 2024 • 115
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think Paper
• 2409.11355
• Published Sep 17, 2024 • 30
Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors Paper
• 2409.17058
• Published Sep 25, 2024 • 13
Self-Supervised Any-Point Tracking by Contrastive Random Walks Paper
• 2409.16288
• Published Sep 24, 2024 • 6
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense
Prediction Paper
• 2409.18124
• Published Sep 26, 2024 • 33
MinerU: An Open-Source Solution for Precise Document Content Extraction Paper
• 2409.18839
• Published Sep 27, 2024 • 41
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second Paper
• 2410.02073
• Published Oct 2, 2024 • 43
Towards Natural Image Matting in the Wild via Real-Scenario Prior Paper
• 2410.06593
• Published Oct 9, 2024 • 4
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a
Training-Free Memory Tree Paper
• 2410.16268
• Published Oct 21, 2024 • 70
SMITE: Segment Me In TimE Paper
• 2410.18538
• Published Oct 24, 2024 • 16
GrounDiT: Grounding Diffusion Transformers via Noisy Patch
Transplantation Paper
• 2410.20474
• Published Oct 27, 2024 • 14
DELTA: Dense Efficient Long-range 3D Tracking for any video Paper
• 2410.24211
• Published Oct 31, 2024 • 9
Face Anonymization Made Simple Paper
• 2411.00762
• Published Nov 1, 2024 • 9
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text,
and Architectural Enhancements Paper
• 2411.12044
• Published Nov 18, 2024 • 14
SEAGULL: No-reference Image Quality Assessment for Regions of Interest
via Vision-Language Instruction Tuning Paper
• 2411.10161
• Published Nov 15, 2024 • 9
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking
with Motion-Aware Memory Paper
• 2411.11922
• Published Nov 18, 2024 • 19
DINO-X: A Unified Vision Model for Open-World Object Detection and
Understanding Paper
• 2411.14347
• Published Nov 21, 2024 • 16
Knowledge Transfer Across Modalities with Natural Language Supervision Paper
• 2411.15611
• Published Nov 23, 2024 • 16
Edge Weight Prediction For Category-Agnostic Pose Estimation Paper
• 2411.16665
• Published Nov 25, 2024 • 6
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State
Space Duality Paper
• 2411.15241
• Published Nov 22, 2024 • 7
Scaling Image Tokenizers with Grouped Spherical Quantization Paper
• 2412.02632
• Published Dec 3, 2024 • 10
EMOv2: Pushing 5M Vision Model Frontier Paper
• 2412.06674
• Published Dec 9, 2024 • 13
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token
Marks Paper
• 2501.08326
• Published Jan 14, 2025 • 34
iFormer: Integrating ConvNet and Transformer for Mobile Application Paper
• 2501.15369
• Published Jan 26, 2025 • 13
MatAnyone: Stable Video Matting with Consistent Memory Propagation Paper
• 2501.14677
• Published Jan 24, 2025 • 34
PixelWorld: Towards Perceiving Everything as Pixels Paper
• 2501.19339
• Published Jan 31, 2025 • 17
SAeUron: Interpretable Concept Unlearning in Diffusion Models with
Sparse Autoencoders Paper
• 2501.18052
• Published Jan 29, 2025 • 8
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding Paper
• 2503.10596
• Published Mar 13, 2025 • 18
SmolDocling: An ultra-compact vision-language model for end-to-end
multi-modal document conversion Paper
• 2503.11576
• Published Mar 14, 2025 • 156
Semantic Library Adaptation: LoRA Retrieval and Fusion for
Open-Vocabulary Semantic Segmentation Paper
• 2503.21780
• Published Mar 27, 2025 • 9
TAPNext: Tracking Any Point (TAP) as Next Token Prediction Paper
• 2504.05579
• Published Apr 8, 2025 • 5
DC-SAM: In-Context Segment Anything in Images and Videos via Dual
Consistency Paper
• 2504.12080
• Published Apr 16, 2025 • 8
Group Downsampling with Equivariant Anti-aliasing Paper
• 2504.17258
• Published Apr 24, 2025 • 9
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for
Image Analysis Paper
• 2505.09358
• Published May 14, 2025 • 27
PictSure: Pretraining Embeddings Matters for In-Context Learning Image
Classifiers Paper
• 2506.14842
• Published Jun 16, 2025 • 7
Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with
Weak Supervision Paper
• 2507.20976
• Published Jul 28, 2025 • 11
IAUNet: Instance-Aware U-Net Paper
• 2508.01928
• Published Aug 3, 2025 • 9
A Coarse-to-Fine Approach to Multi-Modality 3D Occupancy Grounding Paper
• 2508.01197
• Published Aug 2, 2025 • 5
Paper
• 2508.10104
• Published Aug 13, 2025 • 303
UniPixel: Unified Object Referring and Segmentation for Pixel-Level
Visual Reasoning Paper
• 2509.18094
• Published Sep 22, 2025 • 4
SAM 3: Segment Anything with Concepts Paper
• 2511.16719
• Published Nov 20, 2025 • 134