Improving Robustness of Semantic Segmentation to Motion-Blur Using Class-Centric Augmentation

3DAvatarGAN Bridging Domains for Personalized Editable Avatars

Topology-Guided Multi-Class Cell Context Generation for Digital Pathology

Affection Learning Affective Explanations for Real-World Visual Dat

ShapeTalk A Language Dataset and Framework for 3D Shape Edits

Canonical Fields Self-Supervised Learning of Pose-Canonicalized Neural Fields

Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving

Multi-Realism Image Compression With a Conditional Generato

Interactive Cartoonization With Controllable Perceptual Factors

LINe Out-of-Distribution Detection by Leveraging Important Neurons

Neural Kaleidoscopic Space Sculpting

Neural Rate Estimator and Unsupervised Learning for Efficient Distributed Imag

Balanced Product of Calibrated Experts for Long-Tailed Recognition

HRDFuse Monocular 360deg Depth Estimation by Collaboratively Learning Holistic-With-Regional Depth

MetaCLUE Towards Comprehensive Visual Metaphors Research

Hierarchical B-Frame Video Coding Using Two-Layer CANF Without Motion Coding

Look Radiate and Learn Self-Supervised Localisation via Radio-Visual Correspondenc

Is BERT Blind Exploring the Effect of Vision-and-Language Pretraining on

DC2 Dual-Camera Defocus Control by Learning To Refocus

RenderDiffusion Image Diffusion for 3D Reconstruction Inpainting and Generation

RangeViT Towards Vision Transformers for 3D Semantic Segmentation in Autonomous

PanoHead Geometry-Aware 3D Full-Head Synthesis in 360deg

ZBS Zero-Shot Background Subtraction via Instance-Level Background Modeling and Foregroun

Deep Curvilinear Editing Commutative and Nonlinear Image Manipulation for Pretrain

BUFFER Balancing Accuracy Efficiency and Generalizability in Point Cloud Registration

CIRCLE Capture in Rich Contextual Environments

Ham2Pose Animating Sign Language Notation Into Pose Sequences

Learning Geometric-Aware Properties in 2D Representation Using Lightweight CAD Models

HierVL Learning Hierarchical Video-Language Embeddings

MaLP Manipulation Localization Using a Proactive Schem

Spider GAN Leveraging Friendly Neighbors To Accelerate GAN Training

Self-Supervised Learning From Images With a Joint-Embedding Predictive Architectu

TarViS A Unified Approach for Target-Based Video Segmentation

Generalizable Local Feature Pre-Training for Deformable Shape Analysis

Understanding and Improving Features Learned in Deep Functional Maps

HyperReel High-Fidelity 6-DoF Video With Ray-Conditioned Sampling

SpaText Spatio-Textual Representation for Controllable Image Generation

TempSAL - Uncovering Temporal Information for Deep Saliency Prediction

High-Res Facial Appearance Capture From Polarized Smartphone Images

A New Dataset Based on Images Taken by Blind Peopl

Test of Time Instilling Video-Language Models With a Sense o

Affordances From Human Videos as a Versatile Representation for Robotics

AUNet Learning Relations Between Action Units for Face Forgery Detection

Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation

FFHQ-UV Normalized Facial UV-Texture Dataset for 3D Face Reconstruction

GLeaD Improving GANs With a Generator-Leading Task

High-Fidelity Facial Avatar Reconstruction From Monocular Video With Generative Priors

Learning Personalized High Quality Volumetric Head Avatars From Monocular RGB

Masked Autoencoders Enable Efficient Knowledge Distillers

Sliced Optimal Partial Transport

Bayesian Posterior Approximation With Stochastic Ensembles

Learning Visual Representations via Language-Guided Sampling

AdaMAE Adaptive Masking for Efficient Spatiotemporal Learning With Masked Autoencoders

DualRefine Self-Supervised Depth and Pose Estimation Through Iterative Epipolar Sampling

Learning To Exploit Temporal Structure for Biomedical Vision-Language Processing

Neural Pixel Composition for 3D-4D View Synthesis From Multi-Views

All Are Worth Words A ViT Backbone for Diffusion Models

CiCo Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning

DexArt Benchmarking Generalizable Dexterous Manipulation With Articulated Objects

Object Discovery From Motion-Guided Tokens

SINE Semantic-Driven Image-Based NeRF Editing With Prior-Guided Editing Fiel

A Large-Scale Homography Benchmark

Finding Geometric Models by Clustering in the Consensus Spac

Attribute-Preserving Face Dataset Anonymization via Latent Code Optimization

Two-View Geometry Scoring Without Correspondences

Pseudo-Label Guided Contrastive Learning for Semi-Supervised Medical Image Segmentation

MaskSketch Unpaired Structure-Guided Masked Image Generation

RMLVQA A Margin Loss Approach for Visual Question Answering With

Galactic Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-per-Secon

Kernel Aware Resampl

Blowing in the Wind CycleNet for Human Cinemagraphs From Still

FlexiViT One Model for All Patch Sizes

A Light Touch Approach to Teaching Transformers Multi-View Geometry

CCuantuMM Cycle-Consistent Quantum-Hybrid Matching of Multiple Shapes

Person Image Synthesis via Denoising Diffusion Model

Sketch2Saliency Learning To Detect Salient Objects From Human Drawings

NoPe-NeRF Optimising Neural Radiance Field With No Pose Prio

Shortcomings of Top-Down Randomization-Based Sanity Checks for Evaluations of D

Probabilistic Debiasing of Scene Graphs

BEDLAM A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animat

Align Your Latents High-Resolution Video Synthesis With Latent Diffusion Models

Architectural Backdoors in Neural Networks

Meta Omnium A Benchmark for General-Purpose Learning-To-Learn

Neural Part Priors Learning To Optimize Part-Based Object Completion in

Instant Multi-View Head Capture Through Learnable Registration

DejaVu Conditional Regenerative Learning To Enhance Dense Prediction

Open-Set Likelihood Maximization for Few-Shot Learning

ALSO Automotive Lidar Self-Supervision by Occupancy Estimation

CR-FIQA Face Image Quality Assessment by Learning Sample Relative Classifiability

A-La-Carte Prompt Tuning APT Combining Distinct Data via Composable Prompting

Accelerated Coordinate Encoding Learning to Relocalize in Minutes Using RGB

A Probabilistic Framework for Lifelong Test-Time Adaptation

Open-Vocabulary Attribute Detection

Omni3D A Large Benchmark and Model for 3D Object Detection

InstructPix2Pix Learning To Follow Image Editing Instructions

Leveraging Inter-Rater Agreement for Classification in the Presence of Noisy

Learning and Aggregating Lane Graphs for Urban Automated Driving

LASP Text-to-Text Optimization for Language-Aware Soft Prompting of Vision

Rate Gradient Approximation Attack Threats Deep Spiking Neural Networks

Introducing Competition To Boost the Transferability of Targeted Adversarial Examples

Ensemble-Based Blackbox Attacks on Dense Prediction

MARLIN Masked Autoencoder for Facial Video Representation LearnINg

Multi-Centroid Task Descriptor for Dynamic Class Incremental Inferenc

NeuDA Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction

Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon

Orthogonal Annotation Benefits Barely-Supervised Medical Image Segmentation

RIAV-MVS Recurrent-Indexing an Asymmetric Volume for Multi-View Stereo

Source-Free Adaptive Gaze Estimation by Uncertainty Reduction

A New Comprehensive Benchmark for Semi-Supervised Video Anomaly Detection an

CiaoSR Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution

Contrastive Mean Teacher for Domain Adaptive Object Detectors

Event-Guided Person Re-Identification via Sparse-Dense Complementary Learning

HexPlane A Fast Representation for Dynamic Scenes

Iterative Proposal Refinement for Weakly-Supervised Video Grounding

Multi-View Azimuth Stereo via Tangent Space Consistency

Observation-Centric SORT Rethinking SORT for Robust Multi-Object Tracking

Physics-Guided ISO-Dependent Sensor Noise Modeling for Extreme Low-Light Photography

Real-Time Neural Light Field on Mobile Devices

Recurrent Homography Estimation Using Homography-Guided Image Warping and Focus Transform

Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching

SeSDF Self-Evolved Signed Distance Field for Implicit 3D Clothed Human

SVGformer Representation Learning for Continuous Vector Graphics Using Transformers

Three Guidelines You Should Know for Universally Slimmable Self-Supervised Learning

Few-Shot Semantic Image Synthesis With Class Affinity Trans

Towards Better Decision Forests Forest Alternating Optimization

Generalizing Dataset Distillation via Deep Generative Prio

Enlarging Instance-Specific and Class-Specific Information for Open-Set Action Recognition

CoMFormer Continual Learning in Semantic and Panoptic Segmentation

Unifying Short and Long-Term Tracking With Graph Hierarchies

An Image Quality Assessment Dataset for Portraits

LayoutDM Transformer-Based Diffusion Model for Layout Generation

Persistent Nature A Generative Model of Unbounded 3D Worlds

Recognizability Embedding Enhancement for Very Low-Resolution Face Recognition and Quality

Seeing With Sound Long-range Acoustic Beamforming for Multimodal Scene Understanding

Continuous Landmark Detection With 3D Queries

1000 FPS HDR Video With a Spike-RGB Hybrid Cam

An Erudite Fine-Grained Visual Classification Model

Depth Estimation From Indoor Panoramas With Neural Scene Representation

Domain Generalized Stereo Matching via Hierarchical Visual Transformation

L-CoIns Language-Based Colorization With Instance Awareness

Making Vision Transformers Efficient From a Token Sparsification View

Pointersect Neural Rendering With Cloud-Ray Intersection

Histopathology Whole Slide Image Analysis With Heterogeneous Graph Representation Learning

Equivalent Transformation and Dual Stream Network Construction for Mobile Imag

AVFace Towards Detailed Audio-Visual 4D Face Reconstruction

Data-Free Sketch-Based Image Retrieval

Learning To Generate Text-Grounded Mask for Open-World Semantic Segmentation From

Rebalancing Batch Normalization for Exemplar-Based Class-Incremental Learning

Privacy-Preserving Representations Are Not Enough Recovering Scene Content From Cam

BoxTeacher Exploring High-Quality Pseudo Labels for Weakly Supervised Instance Segmentation

M6Doc A Large-Scale Multi-Format Multi-Type Multi-Layout Multi-Language Multi-Annotation Category Dataset

Out-of-Candidate Rectification for Weakly Supervised Semantic Segmentation

Panoptic Compositional Feature Field for Editable Scene Rendering With Network-In

SDFusion Multimodal 3D Shape Completion Reconstruction and Generation

VindLU A Recipe for Effective Video-and-Language Pretraining

WildLight In-the-Wild Inverse Rendering With a Flashlight

Activating More Pixels in Image Super-Resolution Transform

Affordance Grounding From Demonstration Video To Target Imag

AnchorFormer Point Cloud Completion From Discriminative Nodes

A Unified Knowledge Distillation Framework for Deep Directed Graphical Models

Better CMOS Produces Clearer Images Learning Space-Variant Blur Estimation fo

Beyond Appearance A Semantic Controllable Self-Supervised Learning Framework for Human-Centric

Boosting Semi-Supervised Learning by Exploiting All Unlabeled Dat

Boundary Unlearning Rapid Forgetting of Deep Networks via Shifting th

Cascaded Local Implicit Transformer for Arbitrary-Scale Super-Resolution

Cascade Evidential Learning for Open-World Weakly-Supervised Temporal Action Localization

CLIP2Scene Towards Label-Efficient 3D Scene Understanding by CLI

DAA A Delta Age AdaIN Operation for Age Estimation vi

DBARF Deep Bundle-Adjusting Generalizable Neural Radiance Fields

DeepMapping2 Self-Supervised Large-Scale LiDAR Map Optimization

Detecting Human-Object Contact in Images

DisCo-CLIP A Distributed Contrastive Loss for Memory Efficient CLIP Training

Divide and Conquer Answering Questions With Object Factorization and Compositional

DPF Learning Dense Prediction Fields With Weak Supervision

Effective Ambiguity Attack Against Passport-Based DNN Intellectual Property Protection Schemes

Elastic Aggregation for Federated Optimization

End-to-End 3D Dense Captioning With Vote2Cap-DET

Enhanced Multimodal Representation Learning With Cross-Modal KD

Enhanced Training of Query-Based Object Detection via Selective Query Recollection

Executing Your Commands via Motion Diffusion in Latent Spac

Extracting Class Activation Maps From Non-Discriminative Features As Well

FFF Fragment-Guided Flexible Fitting for Building Complete Protein Structures

From Node Interaction To Hop Interaction New Effective and Scalabl

Generative Semantic Segmentation

GM-NeRF Learning Generalizable Model-Based Neural Radiance Fields From Multi-View Images

gSDF Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction

Hand Avatar Free-Pose Hand Animation and Rendering From Monocular Video

HNeRV A Hybrid Neural Representation for Videos

Human Guided Ground-Truth Generation for Realistic Image Super-Resolution

Imitation Learning As State Matching via Differentiable Physics

Implicit Neural Head Synthesis via Controllable Local Deformation Fields

Improved Test-Time Adaptation for Domain Generalization

iQuery Instruments As Queries for Audio-Visual Sound Separation

LargeKernel3D Scaling Up Kernels in 3D Sparse CNNs

Learning a Deep Color Difference Metric for Photographic Images

Learning a Sparse Transformer Network for Effective Image Deraining

Learning From Unique Perspectives User-Aware Saliency Modeling

Learning the Distribution of Errors in Stereo Matching for Joint

Local-to-Global Registration for Bundle-Adjusting Neural Radiance Fields

MagicNet Semi-Supervised Multi-Organ Segmentation via Magic-Cube Partition and Recovery

MammalNet A Large-Scale Video Benchmark for Mammal Recognition and Behavio

Masked Image Training for Generalizable Deep Image Denoising

Meta-Causal Learning for Single Domain Generalization

Mixed Autoencoder for Self-Supervised Visual Representation Learning

MobileNeRF Exploiting the Polygon Rasterization Pipeline for Efficient Neural Fiel

Mod-Squad Designing Mixtures of Experts As Modular Multi-Task Learners

Movies2Scenes Using Movie Metadata To Learn Scene Representation

Multivariate Multi-Frequency and Multimodal Rethinking Graph Neural Networks for Emotion

NeuralEditor Editing Neural Radiance Fields via Manipulating Point Clouds

Novel-View Acoustic Synthesis

OvarNet Towards Open-Vocabulary Object Attribute Recognition

PAniC-3D Stylized Single-View 3D Reconstruction From Portraits of Anime Characters

PiMAE Point Cloud and Image Interactive Masked Autoencoders for 3D

Private Image Generation With Dual-Purpose Auxiliary Classifi

RankMix Data Augmentation for Weakly Supervised Learning of Classifying Whol

Revisiting Multimodal Representation in Contrastive Learning From Patch and Token

Run Dont Walk Chasing Higher FLOPS for Faster Neural Networks

ScaleDet A Scalable Multi-Dataset Object Detecto

Seeing Beyond the Brain Conditional Diffusion Model With Sparse Mask

SeqTrack Sequence to Sequence Learning for Visual Object Tracking

SparseViT Revisiting Activation Sparsity for Efficient High-Resolution Vision Transform

TexPose Neural Texture Learning for Self-Supervised 6D Object Pose Estimation

The Dark Side of Dynamic Routing Neural Networks Towards Efficiency

Towards Modality-Agnostic Person Re-Identification With Descriptive Query

Train-Once-for-All Personalization

Transfer Knowledge From Head to Tail Uncertainty Calibration Under Long-Tail

TrojDiff Trojan Attacks on Diffusion Models With Diverse Targets

Understanding and Improving Visual Prompting A Label-Mapping Perspectiv

Unsupervised Inference of Signed Distance Functions From Single Sparse Point

Unsupervised Sampling Promoting for Stochastic Human Trajectory Prediction

UV Volumes for Real-Time Rendering of Editable Free-View Human Performanc

ViewNet A Novel Projection-Based Backbone With View Pooling for Few-Shot

Viewpoint Equivariance for Multi-View 3D Object Detection

ViLEM Visual-Language Error Modeling for Image-Text Retrieval

VoxelNeXt Fully Sparse VoxelNet for 3D Object Detection and Tracking

Are Deep Neural Networks SMARTer Than Second Graders

Reproducible Scaling Laws for Contrastive Language-Image Learning

Image Quality-Aware Diagnosis via Meta-Knowledge Co-Embedding

Automatic High Resolution Wire Segmentation and Removal

AdamsFormer for Spatial Action Localization in the Futu

BEV-SAN Accurate BEV 3D Object Detection via Slice Attention Networks

HDR Imaging With Spatially Varying Signal-to-Noise Ratios

Adversarial Normalization I Can Visualize Everything ICE

Balanced Energy Regularization Loss for Out-of-Distribution Detection

Balanced Spherical Grid for Egocentric View Synthesis

Dynamic Neural Network for Multi-Task Learning Searching Across Diverse Network

Local-Guided Global Paired Similarity Representation for Visual Reinforcement Learning

MAIR Multi-View Attention Inverse Rendering With 3D Spatially-Varying Lighting Estimation

N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution

Progressive Random Convolutions for Single Domain Generalization

Restoration of Hand-Drawn Architectural Drawings Using Latent Space Mapping With

TMO Textured Mesh Acquisition of Objects With a Mobile Devic

Context-Aware Relative Object Queries To Unify Video Instance and Panoptic

How to Backdoor Diffusion Models

SceneTrilogy On Human Scene-Sketch and Its Complementarity With Photo an

What Can Human Sketches Do for Object Detection

STDLens Model Hijacking-Resilient Federated Learning for Object Detection

Generative Bias for Robust Visual Question Answering

Implicit 3D Human Mesh Recovery Using Consistency With Pose an

itKD Interchange Transfer-Based Knowledge Distillation for 3D Object Detection

Learning Adaptive Dense Event Stereo From the Image Domain

Look Around for Anomalies Weakly-Supervised Anomaly Detection via Context-Motion Relational

PartDistillation Learning Parts From Instance Segmentation

Transformer-Based Unified Recognition of Two Hands Manipulating Objects

Learning Human-to-Robot Handovers From Point Clouds

Regularization of Polynomial Networks for Image Recognition

Shakes on a Plane Unsupervised Depth Estimation From Unstabilized Photography

Parallel Diffusion Models of Operator and Image for Blind Invers

Solving 3D Inverse Problems Using Pre-Trained 2D Diffusion Models

BUOL A Bottom-Up Framework With Occupancy-Aware Lifting for Panoptic 3D

Command-Driven Articulated Object Understanding and Manipulation

GFPose Learning 3D Human Pose Prior With Gradient Fields

UniHCP A Unified Model for Human-Centric Perceptions

RealImpact A Dataset of Impact Sound Fields for Real Objects

Where We Are and What Were Looking At Query Bas

Combining Implicit-Explicit View Correlation for Light Field Semantic Segmentation

Learning To Dub Movies via Hierarchical Prosody Models

Structured 3D Features for Reconstructing Controllable Avatars

The Differentiable Lens Compound Lens Search Over Glass Surfaces an

Seasoning Model Soups for Robustness to Adversarial and Natural Distribution

Biomechanics-Guided Facial Action Unit Detection Through Force Modeling

Feature Aggregated Queries for Transformer-Based Video Object Detectors

KD-DLGAN Data Limited Image Generation via Knowledge Distillation

Learning Joint Latent Space EBM Prior Model for Multi-Layer Generato

Multi-Modal Gait Recognition via Effective Spatial-Temporal Feature Fusion

Neuralizer General Neuroimage Analysis Without Re-Training

Mofusion A Framework for Denoising-Diffusion-Based Motion Synthesis

Disentangling Writer and Character Styles for Handwriting Generation

Hybrid Neural Rendering for Large-Scale Scenes With Motion Blu

Nighttime Smartphone Reflective Flare Removal Using Optical Center Symmetry Prio

SLOPER4D A Scene-Aware Dataset for Global 4D Human Pose Estimation

Improving Selective Visual Question Answering by Learning From Your Peers

Thermal Spread Functions TSF Physics-Guided Material Classification

Learning Expressive Prompting With Residuals for Vision Transformers

Weakly-Supervised Domain Adaptive Semantic Segmentation With Prototypical Contrastive Learning

TimeBalance Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition

3D Highlighter Localizing Regions on 3D Shapes via Text Descriptions

Objaverse A Universe of Annotated 3D Objects

Phone2Proc Bringing Robust Robots Into Our Chaotic Worl

Meta-Tuning Loss Functions and Data Augmentation for Few-Shot Object Detection

3D-Aware Conditional Image Synthesis

Harmonious Teacher for Cross-Domain Object Detection

Learning Detailed Radiance Manifolds for High-Fidelity and 3D-Consistent Portrait Synthesis

NeRDi Single-View NeRF Synthesis With Language-Guided Diffusion As General Imag

PointVector A Vector Representation in Point Cloud Analysis

SE-ORNet Self-Ensembling Orientation-Aware Network for Unsupervised Point Cloud Shape Correspondenc

Therbligs in Action Video Understanding Through Motion Primitives

Cross-Domain Image Captioning With Discriminative Finetuning

Learning a Depth Covariance Function

Jorge Reliability in Semantic Segmentation Are We on the Right Track

Luigi DrapeNet Garment Generation and Self-Supervised Draping

Plaen Unbalanced Optimal Transport A Unified Framework for Object Detection

Silva Edirimuni IterativePFN True Iterative Point Cloud Filtering

CAP Robust Point Cloud Classification via Semantic and Structural Modeling

DiffusionRig Learning Personalized Priors for Facial Appearance Editing

Exploring Structured Semantic Prior for Multi Label Recognition With Incomplet

HGFormer Hierarchical Grouping Transformer for Domain Generalized Semantic Segmentation

Hidden Gems 4D Radar Scene Flow Learning Using Cross-Modal Supervision

Mitigating Task Interference in Multi-Task Learning via Explicit Task Routing

Network Expansion for Practical Training Acceleration

PLA Language-Driven Open-Vocabulary 3D Scene Understanding

Revisiting the P3P Problem

Visual Dependency Transformers Dependency Tree Emerges From Reversed Attention

Robust Mean Teacher for Continual and Gradual Test-Time Adaptation

Sphere-Guided Training of Neural Implicit Surfaces

Adversarial Robustness via Random Projection Filters

Benchmarking Robustness of 3D Object Detection to Common Corruptions

DisWOT Student Architecture Search for Distillation WithOut Training

Fast Monocular Scene Reconstruction With Global-Sparse Local-Dense Grids

Federated Incremental Semantic Segmentation

Implicit Identity Leakage The Stumbling Block to Improving Deepfake Detection

MaskCLIP Masked Self-Distillation Advances Contrastive Language-Image Pretraining

Residual Degradation Learning Unfolding Framework With Mixing Priors Across Spectral

Rethinking Optical Flow From Geometric Matching Consistent Perspectiv

The Enemy of My Enemy Is My Friend Exploring Invers

Weakly Supervised Video Representation Learning With Unaligned Text for Sequential

GaitGCI Generative Counterfactual Intervention for Gait Recognition

Multiplicative Fourier Level of Detail

Teaching Structured Vision Language Concepts to Vision Languag

Quantitative Manipulation of Custom Attributes on 3D-Aware Image Synthesis

Federated Learning With Data-Agnostic Distribution Fusion

RWSC-Fusion Region-Wise Style-Controlled Fusion Network for the Prohibited X-Ray Security

Burstormer Burst Image Restoration and Enhancement Transform

Modular Memorability Tiered Representations for Video Memorability Prediction

Adaptive Sparse Convolutional Networks With Global Context Enhancement for Fast

Avatars Grow Legs Generating Smooth Human Motion From Sparse Tracking

Conditional Generation of Audio From Video via Foley Analogies

Dual-Bridging With Adversarial Noise Generation for Domain Adaptive rPPG Estimation

Efficient Mask Correction for Click-Based Interactive Image Segmentation

Global and Local Mixture Consistency Cumulative Learning for Long-Tailed Visual

Learning To Render Novel Views From Wide-Baseline Stereo Pairs

Minimizing the Accumulated Trajectory Error To Improve Dataset Distillation

No One Left Behind Improving the Worst Categories in Long-Tail

Object-Goal Visual Navigation via Effective Exploration of Relations Among Historical

On-the-Fly Category Discovery

Rethinking the Approximation Error in 3D Surface Fitting for Point

SuperDisco Super-Class Discovery Improves Visual Recognition for the Long-Tail

Weak-Shot Object Detection Through Mutual Knowledge Trans

StepFormer Self-Supervised Step Discovery and Localization in Instructional Videos

DKM Dense Kernelized Feature Matching for Geometry Estimation

G-MSM Unsupervised Multi-Shape Matching With Graph-Based Affinity Priors

Why Is the Winner the Best

EvShutter Transforming Events for Unconstrained Rolling Shutter Correction

DepGraph Towards Any Structural Pruning

Efficient Robust Principal Component Analysis via Block Krylov Iteration an

EVA Exploring the Limits of Masked Visual Representation Learning at

Learning Analytical Posterior Probability for Human Mesh Recovery

Self-Supervised Non-Uniform Kernel Estimation With Flow-Based Motion Prior for Blin

TBP-Former Learning Temporal Birds-Eye-View Pyramid for Joint Perception and Prediction

You Can Ground Earlier Than See An Effective and Efficient

ARCTIC A Dataset for Dexterous Bimanual Hand-Object Manipulation

Joint Appearance and Motion Learning for Efficient Rolling Shutter Correction

OpenGait Revisiting Gait Recognition Towards Better Practicality

PMR Prototypical Modal Rebalance for Multimodal Learning

PointListNet Deep Learning on 3D Point Lists

SelfME Self-Supervised Motion Learning for Micro-Expression Recognition

Quantum Multi-Model Fitting

Generative Diffusion Prior for Unified Image Restoration and Enhancement

Masked Auto-Encoders Meet Generative Adversarial Networks and Beyon

CRAFT Concept Recursive Activation FacTorization for Explainability

Dont Lie to Me Robust and Efficient Explainability With Verifi

3D Spatial Multimodal Knowledge Accumulation for Scene Graph Prediction in

AeDet Azimuth-Invariant Multi-View 3D Object Detection

Detecting Backdoors in Pre-Trained Encoders

Dynamic Generative Targeted Attacks With Pattern Injection

ERNIE-ViLG 2.0 Improving Text-to-Image Diffusion Model With Knowledge-Enhanced Mixture-of-Denoising-Experts

Evolved Part Masking for Self-Supervised Learning

Generating Aligned Pseudo-Supervision From Non-Aligned Data for Image Restoration in

Learning Federated Visual Prompt in Null Space for MRI Reconstruction

MaskCon Masked Contrastive Learning for Coarse-Labelled Dataset

Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in

Network-Free Unsupervised Semantic Segmentation With Synthetic Images

Neural Dependencies Emerging From Learning Massive Categories

NVTC Nonlinear Vector Transform Coding

OT-Filter An Optimal Transport Filter for Learning With Noisy Labels

Probing Sentiment-Oriented Pre-Training Inspired by Human Sentiment Perception Mechanism

RONO Robust Discriminative Learning With Noisy Labels for 2D-3D Cross-Modal

Self-Supervised Video Forensics by Audio-Visual Anomaly Detection

Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification

Uncertainty-Aware Vision-Based Metric Cross-View Geolocalization

Semi-Supervised Learning Made Simple With Self-Supervised Clustering

Tree Instance Segmentation With Temporal Contour Graph

Plateau-Reduced Differentiable Path Tracing

System-Status-Aware Adaptive Network for Online Streaming Video Understanding

Unified Pose Sequence Modeling

Reconstructing Signing Avatars From Video Using Linguistic Priors

Leveraging Temporal Context in Low Representational Power Regimes

Batch Model Consolidation A Multi-Task Model Consolidation Framework

Probing Neural Representations of Scene Perception in a Hippocampally Dependent

K-Planes Explicit Radiance Fields in Space Time and Appearanc

The Best Defense Is a Good Offense Adversarial Augmentation Against

VIVE3D Viewpoint-Independent Video Editing Using 3D-Aware GANs

Controllable Light Diffusion for Portraits

An Empirical Study of End-to-End Video-Language Transformers With Masked Visual

Auto-CARD Efficient and Robust Codec Avatar Driving for Real-Time Mobil

Learning a Simple Low-Light Image Enhancer From Paired Low-Light Instances

Learning Semantic Relationship Among Instances for Image-Text Matching

Neural Transformation Fields for Arbitrary-Styled Font Generation

sRGB Real Noise Synthesizing With Neighboring Correlation-Aware Noise Model

StyleAdv Meta Style Adversarial Training for Cross-Domain Few-Shot Learning

Tell Me What Happened Unifying Text-Guided Video Completion via Multimodal

You Do Not Need Additional Priors or Regularizers in Retinex-Bas

CoWs on Pasture Baselines and Benchmarks for Language-Driven Zero-Shot Object

CNVid-3.5M Build Filter and Pre-Train the Large-Scale Public Chinese Video-Text

Collaborative Noisy Label Cleaner Learning Scene-Aware Trailers for Multi-Modal Highlight

Adaptive Zone-Aware Hierarchical Planner for Vision-Language Navigation

AsyFOD An Asymmetric Adaptation Paradigm for Few-Shot Domain Adaptive Object

Backdoor Defense via Adaptively Splitting Poisoned Dataset

Back to the Source Diffusion-Driven Adaptation To Test-Time Corruption

Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception

Decompose More and Aggregate Better Two Closer Looks at Frequency

DKT Diverse Knowledge Transfer Transformer for Class Incremental Learning

Exploring Data Geometry for Continual Learning

Flexible-Cm GAN Towards Precise 3D Dose Prediction in Radiotherapy

Generalized Relation Modeling for Transformer Tracking

High-Fidelity and Freely Controllable Talking Head Video Generation

Implicit Diffusion Models for Continuous Super-Resolution

MIST Multi-Modal Iterative Spatial-Temporal Transformer for Long-Form Video Question Answering

SurfelNeRF Neural Surfel Radiance Fields for Online Photorealistic Reconstruction o

The ObjectFolder Benchmark Multisensory Learning With Neural and Real Objects

ULIP Learning a Unified Representation of Language Images and Point

VisFusion Visibility-Aware Online 3D Scene Reconstruction From Videos

Uncurated Image-Text Datasets Shedding Light on Demographic Bias

Samples With Low Loss Curvature Improve Data Efficiency

Transformer-Based Learned Optimization

Recurrent Vision Transformers for Object Detection With Event Cameras

Dense-Localizing Audio-Visual Events in Untrimmed Videos A Large-Scale Benchmark an

GAPartNet Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable an

Human Pose As Compositional Tokens

Learning Neural Volumetric Representations of Dynamic Humans in Minutes

PartManip Learning Cross-Category Generalizable Part Manipulation Policy From Point Clou

Hyperbolic Contrastive Learning for Visual Representations Beyond Objects

Improving Zero-Shot Generalization and Robustness of Multi-Modal Models

Policy Adaptation From Foundation Model Feedback

Learned Two-Plane Perspective Prior Based Image Resampling for Efficient Object

Real-Time Evaluation in Online Continual Learning A New Ho

Learning Neural Parametric Head Models

Iterative Next Boundary Detection for Instance Segmentation of Tree Rings

Latency Matters Real-Time Action Forecasting Transform

ImageBind One Embedding Space To Bind Them All

OmniMAE Single Model Masked Pretraining on Images and Videos

Interactive Segmentation of Radiance Fields

Video Compression With Entropy-Constrained Neural Representations

Continuous Pseudo-Label Rectified Domain Adaptive Semantic Segmentation With Implicit Neural

DiffPose Toward More Reliable 3D Pose Estimation

MMG-Ego4D Multimodal Generalization in Egocentric Action Recognition

SkyEye Self-Supervised Birds-Eye-View Semantic Mapping Using Monocular Frontal View Images

LiDAR-in-the-Loop Hyperparameter Optimization

Leveraging per Image-Token Consistency for Vision-Language Pre-Training

Rethinking Image Super Resolution From Long-Tailed Distribution Learning Perspectiv

Finetune Like You Pretrain Improved Finetuning of Zero-Shot Vision Models

Towards Practical Plug-and-Play Diffusion Models

PaCa-ViT Learning Patch-to-Cluster Attention in Vision Transformers

HOOD Hierarchical Graphs for Generalized Modelling of Clothing Dynamics

Image Super-Resolution Using T-Tetromino Pixels

Self-Supervised Implicit Glyph Attention for Text Recognition

StyleSync High-Fidelity Generalized and Personalized Lip Sync in Style-Based Generato

MACARONS Mapping and Coverage Anticipation With RGB Online Self-Supervision

PCT-Net Full Resolution Image Harmonization Using Pixel-Wise Color Transformations

TruFor Leveraging All-Round Clues for Trustworthy Image Forgery Detection an

NIFF Alleviating Forgetting in Generalized Few-Shot Object Detection via Neural

ObjectMatch Robust Registration Using Canonical Object Correspondences

Modernizing Old Photos Using Multiple References via Photorealistic Style Trans

ALOFT A Lightweight MLP-Like Architecture With Dynamic Low-Frequency Transform fo

Class Attention Transfer Based Knowledge Distillation

Dealing With Cross-Task Class Discrimination in Online Continual Learning

DINN360 Deformable Invertible Neural Network for Latitude-Aware 360deg Image Rescaling

Distilling Cross-Temporal Contexts for Continuous Sign Language Recognition

From Images to Textual Prompts Zero-Shot Visual Question Answering With

GANmouflage 3D Object Nondetection With Texture Fields

HandNeRF Neural Radiance Fields for Animatable Interacting Hands

Hierarchical Fine-Grained Image Forgery Detection and Localization

Improving Robustness of Vision Transformers by Reducing Sensitivity To Patch

Knowledge Distillation for 6D Pose Estimation by Aligning Distributions o

Learning a Practical SDR-to-HDRTV Up-Conversion Using New Dataset and Degradation

ShadowDiffusion When Degradation Prior Meets Diffusion Model for Shadow Removal

Texts as Images in Prompt Tuning for Multi-Label Image Recognition

Vid2Avatar 3D Avatar Reconstruction From Videos in the Wild vi

Zero-Shot Generative Model Adaptation via Image-Specific Prompt Learning

Class Prototypes Based Contrastive Learning for Classifying Multi-Label and Fine-Grain

Visual Programming Compositional Visual Reasoning Without Training

Mobile User Interface Element Detection via Adaptively Prompt Tuning

MSINet Twins Contrastive Search of Multi-Scale Interaction for Object ReID

Preserving Linear Separability in Continual Learning by Backward Feature Projection

Text With Knowledge Graph Augmented Transformer for Video Captioning

ViP3D End-to-End Visual Trajectory Prediction via 3D Agent Queries

Unified Keypoint-Based Action Recognition Framework via Structured Keypoint Pooling

Best of Both Worlds Multimodal Contrastive Learning With Tabular an

Rigidity-Aware Detection for 6D Object Pose Estimation

Shape-Constraint Recurrent Flow for 6D Object Pose Estimation

A Strong Baseline for Generalized Few-Shot Semantic Segmentation

Hierarchical Neural Memory Network for Low Latency Event Processing

In-Hand 3D Object Scanning From an RGB Sequenc

Efficient Verification of Neural Networks Against LVM-Based Specifications

ABCD Arbitrary Bitwise Coefficient for De-Quantization

AstroNet When Astrocyte Meets Artificial Neural Network

AutoAD Movie Description in Context

FAME-ViL Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks

FashionSAP Symbols and Attributes Prompt for Fine-Grained Fashion Vision-Language Pre-Training

High-Fidelity 3D Human Digitization From Single 2K Resolution Images

High-Fidelity Event-Radiance Recovery via Transient Event Frequency

Learning a 3D Morphable Face Reflectance Model From Low-Cost Dat

Multiscale Tensor Decomposition and Rendering Equation Encoding for View Synthesis

Noisy Correspondence Learning With Meta Similarity Correction

Reinforcement Learning-Based Black-Box Model Inversion Attacks

Dual Alignment Unsupervised Domain Adaptation for Video-Text Retrieval

Learning Attention As Disentangler for Compositional Zero-Shot Learning

Semidefinite Relaxations for Robust Multiview Triangulation

Neighborhood Attention Transform

A Generalized Framework for Video Instance Segmentation

CARTO Category and Joint Agnostic Reconstruction of ARTiculated Objects

3D Video Object Detection With Learnable Object-Centric Global Optimization

Align and Attend Multimodal Summarization With Dual Contrastive Losses

Analyzing and Diagnosing Pose Estimation With Attributions

A Rotation-Translation-Decoupled Solution for Robust and Efficient Visual-Inertial Initialization

Camouflaged Object Detection With Feature Decomposition and Edge Reconstruction

CLIP-S4 Language-Guided Self-Supervised Semantic Segmentation

Compositor Bottom-Up Clustering and Compositing for Robust Part and Object

D2Former Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agent-Bas

Dynamic Focus-Aware Positional Queries for Semantic Segmentation

FastInst A Simple Query-Based Model for Real-Time Instance Segmentation

Few-Shot Geometry-Aware Keypoint Localization

Frustratingly Easy Regularization on Representation Can Boost Deep Reinforcement Learning

Geometric Visual Similarity Learning in 3D Medical Image Self-Supervised Pre-Training

Grad-PU Arbitrary-Scale Point Cloud Upsampling via Gradient Descent With Learn

MSF Motion-Guided Sequential Fusion for Efficient 3D Object Detection From

Primitive Generation and Semantic-Related Alignment for Universal Zero-Shot Segmentation

Semantic-Promoted Debiasing and Background Disambiguation for Zero-Shot Instance Segmentation

Towards Scalable Neural Representation for Diverse Videos

MOVES Manipulated Objects in Video Enable Segmentation

Model-Agnostic Gender Debiased Image Captioning

3D Concept Learning and Reasoning From Multi-View Images

ACL-SPC Adaptive Closed-Loop System for Self-Supervised Point Cloud Completion

Watch or Listen Robust Audio-Visual Speech Recognition With Visual Corruption

Evading DeepFake Detectors via Adversarial Statistical Consistency

Mask3D Pre-Training 2D Vision Transformers by Learning Masked 3D Priors

MIC Masked Image Consistency for Context-Enhanced Domain Adaptation

Learning Locally Editable Virtual Humans

Four-View Geometry With Unknown Radial Distortion

Towards Compositional Adversarial Robustness Generalizing Adversarial Training to Composite Semantic

NS3D Neuro-Symbolic Grounding of 3D Objects and Relations

PosterLayout A New Benchmark and Approach for Content-Aware Visual-Textual Presentation

ReVISE Self-Supervised Speech Resynthesis With Visual Input for Universal an

Adaptive Assignment for Geometry Aware Local Feature Matching

Anchor3DLane Learning To Regress 3D Anchors for Monocular 3D Lan

Boosting Accuracy and Robustness of Student Models via Adaptive Adversarial

Clover Towards a Unified Video-Language Alignment and Fusion Model

Collaborative Diffusion for Multi-Modal Face Generation and Editing

Contrastive Semi-Supervised Learning for Underwater Image Restoration via Reliable Bank

CP3 Channel Pruning Plug-In for Point-Based Networks

Diffusion-Based Generation Optimization and Planning in 3D Scenes

Diversity-Aware Meta Visual Prompting

Divide and Adapt Active Domain Adaptation via Customized Learning

Egocentric Audio-Visual Object Localization

End-to-End Video Matting With Trimap Propagation

Feature Shrinkage Pyramid for Camouflaged Object Detection With Transformers

Generic-to-Specific Distillation of Masked Autoencoders

Implicit Identity Driven Deepfake Face Swapping Detection

Improving Table Structure Recognition With Visual-Alignment Sequential Coordinate Modeling

Inverting the Imaging Process by Learning an Implicit Camera Model

KiUT Knowledge-Injected U-Transformer for Radiology Report Generation

Learning Accurate 3D Shape Based on Stereo Polarimetric Imaging

Learning Sample Relationship for Exposure Correction

Learning To Measure the Point Cloud Reconstruction Loss in

Local Implicit Ray Function for Generalizable Radiance Field Representation

Neural Kernel Surface Reconstruction

Neural Voting Field for Camera-Space 3D Hand Pose Estimation

Not All Image Regions Matter Masked Vector Quantization for Autoregressiv

Parametric Implicit Face Representation for Audio-Driven Facial Reenactment

Progressive Spatio-Temporal Alignment for Efficient Event-Based Motion Estimation

QuantArt Quantizing Image Style Transfer Towards High Visual Fidelity

RefSR-NeRF Towards High Fidelity and Super Resolution View Synthesis

Rethinking Federated Learning With Domain Shift A Prototype View

Rethinking Few-Shot Medical Segmentation A Vector Quantization View

Revisiting Residual Networks for Adversarial Robustness

Robust Generalization Against Photon-Limited Corruptions via Worst-Case Sharpness Minimization

Self-Supervised AutoFlow

Semi-Supervised 2D Human Pose Estimation Driven by Position Inconsistency Pseudo

SemiCVT Semi-Supervised Convolutional Vision Transformer for Semantic Segmentation

ShapeClipper Scalable 3D Shape Learning From Single-View Images via Geometric

Siamese DET

Style Projected Clustering for Domain Generalized Semantic Segmentation

T-SEA Transfer-Based Self-Ensemble Attack on Object Detection

Towards Accurate Image Coding Improved Autoregressive Image Generation With Dynamic

Tracking Multiple Deformable Objects in Egocentric Videos

Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction

Twin Contrastive Learning With Noisy Labels

VoP Text-Video Co-Operative Prompt Tuning for Cross-Modal Retrieval

Weakly Supervised Temporal Sentence Grounding With Uncertainty-Guided Self-Training

SOOD Towards Semi-Supervised Oriented Object Detection

Bridging Search Region Interaction With Template for RGB-T Tracking

Unifying Layout Generation With a Decoupled Diffusion Model

SplineCam Exact Visualization and Characterization of Deep Network Geometry an

GeoVLN Learning Geometry-Enhanced Visual Representation With Slot Attention for Vision-and-Languag

SimpSON Simplifying Photo Cleanup With Single-Click Distracting Object Segmentation Network

Architecture Dataset and Model-Scale Agnostic Data-Free Meta-Learning

A Dynamic Multi-Scale Voxel Flow Network for Video Prediction

Collaboration Helps Camera Overtake LiDAR in 3D Detection

Complexity-Guided Slimmable Decoder for Efficient Deep Video Compression

Continuous Sign Language Recognition With Correlation Network

Dense Network Expansion for Class Incremental Learning

Density-Insensitive Unsupervised Domain Adaption on 3D Object Detection

Discriminator-Cooperated Feature Map Distillation for GAN Compression

Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos

GFIE A Dataset and Baseline for Gaze-Following From 2D to

Label-Free Liver Tumor Segmentation

NeRF-RPN A General Framework for Object Detection in NeRFs

Physically Realizable Natural-Looking Clothing Textures Evade Person Detectors via 3D

Planning-Oriented Autonomous Driving

Point2Pix Photo-Realistic Point Cloud Rendering via Neural Radiance Fields

REVEAL Retrieval-Augmented Visual-Language Pre-Training With Multi-Source Multimodal Knowledge Memory

Self-Guided Diffusion Models

TriVol Point Cloud Rendering via Triple Volumes

You Only Segment Once Towards Real-Time Panoptic Segmentation

Meta-Explore Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding

Text2Scene Text-Driven Indoor Scene Stylization With Part-Aware Details

Local 3D Editing via 3D Distillation of CLIP Knowledg

Fresnel Microfacet BRDF Unification of Polari-Radiometric Surface-Body Reflection

expOSE Accurate Initialization-Free Projective Factorization Using Exponential Regularization

Scalable Detailed and Mask-Free Universal Photometric Stereo

3D Shape Reconstruction of Semi-Transparent Worms

ScaleFL Resource-Adaptive Federated Learning With Heterogeneous Clients

LayoutDM Discrete Diffusion Model for Controllable Layout Generation

Towards Flexible Multi-Modal Document Models

Bias in Pruned Vision Models In-Depth Analysis and Countermeasures

Exact-NeRF An Exploration of a Precise Volumetric Parameterization for Neural

Improving Image Recognition by Retrieving From Web-Scale Image-Text Dat

Exemplar-FreeSOLO Enhancing Unsupervised Instance Segmentation With Exemplars

Efficient Movie Scene Detection Using State-Space Transformers

RelightableHands Efficient Neural Relighting of Articulated Hand Models

SfM-TTR Using Structure From Motion for Test-Time Refinement of Single-View

Normal-Guided Garment UV Prediction for Human Re-Texturing

A Data-Based Perspective on Transfer Learning

A Meta-Learning Approach to Predicting Performance and Data Requirements

DART Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks

Enhanced Stable View Synthesis

OneFormer One Transformer To Rule Universal Image Segmentation

VectorFusion Text-to-SVG by Abstracting Pixel-Based Diffusion Models

VGFlow Visibility Guided Flow Network for Human Reposing

Difficulty-Based Sampling for Debiased Contrastive Representation Learning

Unsupervised Contour Tracking of Live Cells by Mechanical and Cycl

FlexNeRF Photorealistic Free-Viewpoint Rendering of Moving Humans From Sparse Views

Adversarial Counterfactual Visual Explanations

Beyond mAP Towards Better Evaluation of Instance Segmentation

DistractFlow Improving Optical Flow Estimation via Realistic Distractions and Pseudo-Labeling

Enhancing Multiple Reliability Measures via Nuisance-Extended Information Bottleneck

WinCLIP Zero-Few-Shot Anomaly Classification and Segmentation

Context-Based Trit-Plane Coding for Progressive Image Compression

Genie Show Me the Data for Quantization

Polarimetric iToF Measuring High-Fidelity Depth Through Scattering Medi

A2J-Transformer Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation

AligNeRF High-Fidelity Neural Radiance Fields via Alignment-Aware Training

A Probabilistic Attention Model With Occlusion-Aware Texture Regression for 3D

Color Backdoor A Robust Poisoning Attack in Color Spac

Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval

DartBlur Privacy Preservation With Detection Artifact Suppression

DoNet Deep De-Overlapping Network for Cytology Instance Segmentation

Fair Federated Medical Image Segmentation via Client Contribution Estimation

Hierarchical Discriminative Learning Improves Visual Representations of Biomedical Microscopy

HumanGen Generating Human Radiance Fields With Explicit Priors

Instant-NVR Instant Neural Volumetric Rendering for Human-Object Interactions From Monocul

InstantAvatar Learning Avatars From Monocular Video in 60 Seconds

LayoutFormer Conditional Graphic Layout Generation via Constraint Serialization and Decoding

Masked and Adaptive Transformer for Exemplar Based Image Translation

MixPHM Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering

MotionDiffuser Controllable Multi-Agent Motion Prediction Using Diffusion

Neural Intrinsic Embedding for Non-Rigid Point Cloud Matching

Robust Outlier Rejection for 3D Registration With Variational Bayes

Self-Supervised Pre-Training With Masked Shape Prediction for 3D Scene Understanding

StyleIPSB Identity-Preserving Semantic Basis of StyleGAN for High Fidelity Fac

Understanding and Constructing Latent Modality Structures in Multi-Modal Representation Learning

Learning Attribute and Class-Specific Representation Duet for Fine-Grained Fashion Analysis

MSMDFusion Fusing LiDAR and Camera at Multiple Scales With Multi-Depth

DETRs With Hybrid Matching

Think Twice Before Driving Towards Scalable Decoders for End-to-End Autonomous

Deep Graph Reprogramming

A Unified Pyramid Recurrent Network for Video Frame Interpolation

Context-Aware Alignment and Mutual Masking for 3D-Language Pre-Training

Deep Incomplete Multi-View Clustering With Cross-View Partial Sample and Prototy

DNF Decouple and Feedback Network for Seeing in the Dark

Fast Contextual Scene Graph Generation With Unbiased Context Augmentation

Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-Commerc

Long-Tailed Visual Recognition via Self-Heterogeneous Integration With Knowledge Excavation

Multi-Level Logit Distillation

Perspective Fields for Single Image Camera Calibration

Randomized Adversarial Training via Taylor Expansion

ReDirTrans Latent-to-Latent Translation for Gaze and Head Redirection

RefCLIP A Universal Teacher for Weakly Supervised Referring Expression Comprehension

TensoIR Tensorial Inverse Rendering

Video-Text As Game Players Hierarchical Banzhaf Interaction for Cross-Modal Representation

Are Binary Annotations Sufficient Video Moment Retrieval via Hierarchical Uncertainty-Bas

MAP Multimodal Uncertainty-Aware Vision-Language Pre-Training Model

Multispectral Video Semantic Segmentation A Benchmark Dataset and Baselin

Seeing What You Miss Vision-Language Pre-Training With Semantic Completion Learning

Spatial-Temporal Concept Based Explanation of 3D ConvNets

Ultra-High Resolution Segmentation With Ultra-Rich Context A Novel Benchmark

ESLAM Efficient Dense SLAM System Based on Hybrid Representation o

Self-Supervised Representation Learning for CAD

AnyFlow Arbitrary Scale Optical Flow With Implicit Neural Representation

Devils on the Edges Selective Quad Attention for Scene Graph

On the Importance of Accurate Geometry Data for Dense 3D

Distilling Vision-Language Pre-Training To Collaborate With Weakly-Supervised Temporal Action Localization

Human-Art A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes

Principles of Forgetting in Domain-Incremental Semantic Segmentation in Adverse Weath

BiasBed - Rigorous Texture Bias Evaluation

GeoNet Benchmarking Unsupervised Adaptation Across Geographies

A New Path Scaling Vision-and-Language Navigation With Synthetic Instructions an

Benchmarking Self-Supervised Learning on Diverse Pathology Datasets

Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification Segmentation

Meta-Learning With a Geometry-Adaptive Precondition

Scaling Up GANs for Text-to-Image Synthesis

Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal

Superclass Learning With Representation Enhancement

The Dialog Must Go On Improving Visual Dialog via Generativ

Variational Distribution Learning for Unsupervised Text-to-Image Generation

BlendFields Few-Shot Example-Driven Facial Modeling

Invertible Neural Skinning

Self-Correctable and Adaptable Inference for Generalizable Human Pose Estimation

DynamicStereo Consistent Dynamic Depth From Stereo Videos

C-SFDA A Curriculum Learning Aided Self-Training Framework for Efficient Sourc

MED-VT Multiscale Encoder-Decoder Video Transformer With Application To Object Segmentation

HOLODIFFUSION Training a 3D Diffusion Model Using 2D Images

FIANCEE Faster Inference of Adversarial Networks via Conditional Early Exits

HARP Personalized Hand Reconstruction From a Monocular RGB Video

Teleidoscopic Imaging System for Microscale 3D Shape Reconstruction

Imagic Text-Based Real Image Editing With Diffusion Models

2PCNet Two-Phase Consistency Training for Day-to-Night Unsupervised Domain Adaptive Object

Mask-Free Video Instance Segmentation

Neural Preset for Color Style Trans

VILA Learning Image Aesthetics From User Comments With Vision-Language Pretraining

Localized Semantic Feature Mixers for Efficient Pedestrian Detection in Autonomous

Q How To Specialize Large Vision-Language Models to Data-Scarce VQA

Temporally Consistent Online Depth Estimation Using Point-Based Fusion

MaPLe Multi-Modal Prompt Learning

Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting

StyleGAN Salon Multi-View Latent Optimization for Pose-Invariant Hairstyle Trans

Towards Unified Scene Text Spotting Based on Sequence Generation

Achieving a Better Stability-Plasticity Trade-Off via Auxiliary Networks in Continual

Bridging the Gap Between Model Explanations in Partially Annotated Multi-Label

Coreset Sampling From Open-Set for Fine-Grained Self-Supervised Learning

DATID-3D Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generativ

DCFace Synthetic Face Generation With Dual Condition Diffusion Model

Demystifying Causal Features on Adversarial Examples and Causal Inoculation fo

Diffusion Video Autoencoders Toward Temporally Consistent Face Video Editing vi

Event-Based Video Frame Interpolation With Cross-Modal Asymmetric Bidirectional Motion Fields

Feature Separation and Recalibration for Adversarial Robustness

Generalizable Implicit Neural Representations via Instance Pattern Composers

Grounding Counterfactual Explanation of Image Classifiers to Textual Concept Spac

HIER Metric Learning Beyond Class Labels via Hierarchical Regularization

Improving Cross-Modal Retrieval With Set of Diverse Embeddings

MAGVLT Masked Generative Vision-and-Language Transform

NeuralField-LDM Scene Generation With Hierarchical Latent Diffusion Models

On the Stability-Plasticity Dilemma of Class-Incremental Learning

Open-Set Representation Learning Through Combinatorial Embedding

PartMix Regularization Strategy To Learn Part Discovery for Visible-Infrared Person

Re-Thinking Federated Active Learning Based on Inter-Class Diversity

Region-Aware Pretraining for Open-Vocabulary Object Detection With Vision Transformers

Relational Context Learning for Human-Object Interaction Detection

Sampling Is Matter Point-Guided 3D Human Mesh Reconstruction

Shepherding Slots to Objects Towards Stable and Robust Object-Centric Learning

Single Domain Generalization for LiDAR Semantic Segmentation

SMPConv Self-Moving Point Representations for Continuous Convolution

Spatio-Focal Bidirectional Disparity Estimation From a Dual-Pixel Imag

The Devil Is in the Points Weakly Semi-Supervised Instance Segmentation

VNE An Effective Method for Improving Deep Representation by Manipulating

Critical Learning Periods for Multisensory Integration in Deep Networks

X3KD Knowledge Distillation Across Modalities Tasks and Stages for Multi-Cam

Two-Way Multi-Label Loss

Explaining Image Classifiers With Multiscale Directional Image Representation

Picture That Sketch Photorealistic Image Generation From Abstract Sketches

Multi-Label Compound Expression Recognition C-EXPR Database Network

Solving Relaxations of MAP-MRF Problems Combinatorial In-Face Frank-Wolfe Directions

Octree Guided Unoriented Surface Reconstruction

Efficient Frequency Domain-Based Transformers for High-Quality Image Deblurring

Indescribable Multi-Modal Spatial Evaluato

LaserMix for Semi-Supervised LiDAR Semantic Segmentation

Understanding Masked Autoencoders via Hierarchical Latent Variable Models

Understanding Masked Image Modeling via Learning Occlusion Invariant Featu

vMAP Vectorised Object Mapping for Neural Field SLAM

One-Shot Model for Mixed-Precision Quantization

Cross-Image-Attention for Conditional Embeddings in Deep Metric Learning

Passive Micron-Scale Time-of-Flight With Sunlight Interferometry

Swept-Angle Synthetic Wavelength Interferometry

MELTR Meta Loss Transformer for Learning To Fine-Tune Video Foundation

Iterative Vision-and-Language Navigation

PaletteNeRF Palette-Based Appearance Editing of Neural Radiance Fields

Putting People in Their Place Affordance-Aware Human Insertion Into Scenes

StarCraftImage A Dataset for Prototyping Spatial Reasoning Methods for Multi-Agent

Learning To Predict Scene-Level Implicit 3D From Posed RGBD Dat

Multi-Concept Customization of Text-to-Image Diffusion

Few-Shot Referring Relationships in Videos

MethaneMapper Spectral Absorption Aware Hyperspectral Transformer for Methane Detection

Normalizing Flow Based Feature Synthesis for Outlier-Aware Object Detection

IS-GGT Iterative Scene Graph Generation With Generative Transformers

HAAV Hierarchical Aggregation of Augmented Views for Image Captioning

Weakly Supervised Semantic Segmentation via Adversarial Learning of Classifier an

Probabilistic Prompt Learning for Dense Prediction

Renderable Neural Radiance Map for Visual Navigation

Spherical Transformer for LiDAR-Based 3D Recognition

Fantastic Breaks A Dataset of Paired 3D Scans of Real-Worl

SCOOP Self-Supervised Correspondence and Optimization-Based Scene Flow

Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion

Vision Transformers Are Good Mask Auto-Labelers

Simultaneously Short- and Long-Term Temporal Modeling for Semi-Supervised Video Semantic

FitMe Deep Photorealistic 3D Morphable Model Avatars

FFCV Accelerating Training by Removing Data Bottlenecks

BAAM Monocular 3D Pose and Shape Reconstruction With Bi-Contextual Attention

Decomposed Cross-Modal Distillation for RGB-Based Temporal Action Detection

Decompose Adjust Compose Effective Normalization by Playing With Frequency fo

DP-NeRF Deblurred Neural Radiance Field With Physical Scene Priors

Exploring Discontinuity for Video Frame Interpolation

Fix the Noise Disentangling Source Feature for Controllable Domain Translation

Human Pose Estimation in Extremely Low-Light Conditions

Im2Hands Learning Attentive Implicit Representation of Interacting Two-Hand Shapes

Learning Geometry-Aware Representations by Sketching

Learning Rotation-Equivariant Features for Visual Correspondenc

Multimodal Prompting With Missing Modalities for Visual Recognition

Paired-Point Lifting for Enhanced Privacy-Preserving Visual Localization

Revisiting Self-Similarity Structural Embedding for Image Retrieval

Shape-Aware Text-Driven Layered Video Editing

Single View Scene Scale Estimation Using Scale Fiel

TTA-COPE Test-Time Adaptation for Category-Level Object Pose Estimation

A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction

Blind Video Deflickering by Neural Filtering With a Flawed Atlas

EFEM Equivariant Neural Field Expectation Maximization for 3D Object Segmentation

PyramidFlow High-Resolution Defect Contrastive Localization Using Pyramid Normalizing Flow

RGBD2 Generative Scene Synthesis via Incremental View Inpainting Using RGBD

SliceMatch Geometry-Guided Aggregation for Cross-View Pose Estimation

SeaThru-NeRF Neural Radiance Fields in Scattering Medi

Data-Efficient Large Scale Place Recognition With Graded Similarity Supervision

GamutMLP A Lightweight MLP for Color Loss Recovery

Music-Driven Group Choreography

Adaptive Plasticity Improvement for Continual Learning

CrowdCLIP Unsupervised Crowd Counting via Vision-Language Model

HelixSurf A Robust and Efficient Neural Implicit Surface Learning o

Open-Vocabulary Semantic Segmentation With Mask-Adapted CLI

StyLess Boosting the Transferability of Adversarial Examples

Unknown Sniffer for Object Detection Dont Turn a Blind Ey

Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving

Bootstrapping Objectness From Videos by Relaxed Common Fate and Visual

Adaptive Channel Sparsity for Federated Learning Under System Heterogeneity

AttentionShift Iteratively Estimated Part-Based Attention Map for Pointly Supervised Instanc

A Light Weight Model for Active Speaker Detection

EMT-NASTransferring Architectural Knowledge Between Tasks From Different Datasets

High-Fidelity Clothed Avatar Reconstruction From a Single Imag

Revisiting Rolling Shutter Bundle Adjustment Toward Accurate and Fast Solution

BiasAdv Bias-Adversarial Augmentation for Model Debiasing

Learning Optical Expansion From Scale Matching

PanoSwin A Pano-Style Swin Transformer for Panorama Understanding

ShadowNeuS Neural SDF Reconstruction by Shadow Ray Supervision

Actionlet-Dependent Contrastive Learning for Unsupervised Skeleton-Based Action Recognition

Adaptive Human Matting for Dynamic Videos

Being Comes From Not-Being Open-Vocabulary Text-to-Motion Generation With Wordless Training

Bit-Shrinking Limiting Instantaneous Sharpness for Improving Post-Training Quantization

Catch Missing Details Image Reconstruction With Frequency Augmented Variational Autoenco

CLIP Is Also an Efficient Segmenter A Text-Driven Approach fo

Collaborative Static and Dynamic Vision-Language Streams for Spatio-Temporal Video Grounding

Cross-Domain 3D Hand Pose Estimation With Dual Modalities

Deep Frequency Filtering for Domain Generalization

DynamicDet A Unified Dynamic Architecture for Object Detection

ERM-KTP Knowledge-Level Machine Unlearning via Knowledge Trans

Harmonious Feature Learning for Interactive Hand-Object Pose Estimation

Interventional Bag Multi-Instance Learning on Whole-Slide Pathological Images

Learning To Detect Mirrors From Videos via Dual Correspondences

Magic3D High-Resolution Text-to-3D Content Creation

Memory-Friendly Scalable Super-Resolution via Rewinding Lottery Ticket Hypothesis

Meta Architecture for Point Cloud Analysis

Multimodality Helps Unimodality Cross-Modal Few-Shot Learning With Multimodal Models

Neural Scene Chronology

One-Stage 3D Whole-Body Mesh Recovery With Component Aware Transform

Optimal Transport Minimization Crowd Localization on Density Maps for Semi-Supervis

PCR Proxy-Based Contrastive Replay for Online Class-Incremental Continual Learning

Supervised Masked Knowledge Distillation for Few-Shot Transformers

Towards Fast Adaptation of Pretrained Contrastive Models for Multi-Channel Video-Languag

Video Test-Time Adaptation for Action Recognition

Vision Transformers Are Parameter-Efficient Audio-Visual Learners

Zero-Shot Everything Sketch-Based Image Retrieval and in Explainable Styl

Guiding Pseudo-Labels With Uncertainty Estimation for Source-Free Unsupervised Domain Adaptation

3D Line Mapping Revisit

AdaptiveMix Improving GAN Training via Feature Space Shrinkag

Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection

A Soma Segmentation Benchmark in Full Adult Fly Brain

Bitstream-Corrupted JPEG Images Are Restorable Two-Stage Compensation and Alignment Framework

Building Rearticulable Models for Arbitrary 3D Objects From 4D Point

CIGAR Cross-Modality Graph Reasoning for Domain Adaptive Object Detection

Class Adaptive Network Calibration

Continual Detection Transformer for Incremental Object Detection

COT Unsupervised Domain Adaptation With Clustering and Optimal Transport

DA Wand Distortion-Aware Selection Using Neural Mesh Parameterization

DegAE A New Pretraining Paradigm for Low-Level Vision

Delving Into Discrete Normalizing Flows on SO3 Manifold for Probabilistic

Delving Into Shape-Aware Zero-Shot Semantic Segmentation

Delving StyleGAN Inversion for Image Editing A Foundation Latent Spac

Detecting Backdoors During the Inference Stage Based on Corruption Robustness

Diversity-Measurable Anomaly Detection

DualVector Unsupervised Vector Font Synthesis With Dual-Part Representation

EfficientViT Memory Efficient Vision Transformer With Cascaded Group Attention

Explicit Visual Prompting for Low-Level Structure Segmentations

Exploring the Relationship Between Architectural Design and Adversarially Robust Generalization

FAC 3D Representation Learning via Foreground Aware Feature Contrast

Few-Shot Non-Line-of-Sight Imaging With Signal-Surface Collaborative Regularization

Fine-Grained Face Swapping via Regional GAN Inversion

FlatFormer Flattened Window Attention for Efficient Point Cloud Transform

FlowGrad Controlling the Output of Generative ODEs With Gradients

Generating Anomalies for Video Anomaly Detection With Prompt-Based Feature Mapping

GEN Pushing the Limits of Softmax-Based Out-of-Distribution Detection

GRES Generalized Referring Expression Segmentation

Hierarchical Prompt Learning for Multi-Task Learning

Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object

Humans As Light Bulbs 3D Human Reconstruction From Thermal Reflection

InstMove Instance Motion for Object-Centric Video Segmentation

Joint HDR Denoising and Fusion A Real-World Mobile HDR Imag

Learned Image Compression With Mixed Transformer-CNN Architectures

Learning Customized Visual Models With Retrieval-Augmented Knowledg

Learning Orthogonal Prototypes for Generalized Few-Shot Semantic Segmentation

LEMaRT Label-Efficient Masked Region Transform for Image Harmonization

Marching-Primitives Shape Abstraction From Signed Distance Function

MarS3D A Plug-and-Play Motion-Aware Model for Semantic Segmentation on Multi-Scan

MixMAE Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical

MixTeacher Mining Promising Labels With Mixed Scale Teacher for Semi-Supervis

ML2P-Encoder On Exploration of Channel-Class Correlation for Multi-Label Zero-Shot Learning

MMVC Learned Multi-Mode Video Compression With Block-Based Prediction Mode Selection

Multiple Instance Learning via Iterative Self-Paced Supervised Contrastive Learning

NeUDF Leaning Neural Unsigned Distance Fields With Volume Rendering

NoisyQuant Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers

OSAN A One-Stage Alignment Network To Unify Multimodal Alignment an

PartSLIP Low-Shot Part Segmentation for 3D Point Clouds via Pretrain

PD-Quant Post-Training Quantization Based on Prediction Difference Metric

PolyFormer Referring Image Segmentation As Sequential Polygon Generation

Pose-Disentangled Contrastive Learning for Self-Supervised Facial Representation

PoseExaminer Automated Testing of Out-of-Distribution Robustness in Human Pose an

Progressive Neighbor Consistency Mining for Correspondence Pruning

Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot Learning

Promoting Semantic Connectivity Dual Nearest Neighbors Contrastive Learning for Unsupervis

Reducing the Label Bias for Timestamp Supervised Temporal Action Segmentation

Revisiting Temporal Modeling for CLIP-Based Image-to-Video Knowledge Transferring

RIATIG Reliable and Imperceptible Adversarial Text-to-Image Generation With Natural Prompts

Robust Dynamic Radiance Fields

SAP-DETR Bridging the Gap Between Salient Points and Queries-Based Transform

SCOTCH and SODA A Transformer Video Shadow Detection Framework

Semantic Ray Learning a Generalizable Semantic Field With Cross-Reprojection Attention

Semi-Weakly Supervised Object Kinematic Motion Prediction

SimpleNet A Simple Network for Image Anomaly Detection and Localization

Single Image Depth Prediction Made Better A Multivariate Gaussian Tak

Slimmable Dataset Condensation

SlowLiDAR Increasing the Latency of LiDAR-Based Detection Using Adversarial Examples

Soft Augmentation for Image Classification

Spectral Bayesian Uncertainty for Image Super-Resolution

StyleRF Zero-Shot 3D Style Transfer of Neural Radiance Fields

SynthVSR Scaling Up Visual Speech Recognition With Synthetic Supervision

Target-Referenced Reactive Grasping for Dynamic Objects

TWINS A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness

Unsupervised Continual Semantic Adaptation Through Neural Rendering

VLPD Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision

What You Can Reconstruct From a Shadow

3D-Aware Face Swapping

3D-Aware Multi-Class Image-to-Image Translation With NeRFs

3D Cinemagraphy From a Single Imag

ACSeg Adaptive Conceptualization for Unsupervised Semantic Segmentation

Adjustment and Alignment for Unbiased Open Set Domain Adaptation

Adversarially Masking Synthetic To Mimic Real Adaptive Noise Injection fo

AMT All-Pairs Multi-Field Transforms for Efficient Frame Interpolation

An In-Depth Exploration of Person Re-Identification and Gait Recognition in

Are Data-Driven Explanations Robust Against Out-of-Distribution Dat

AShapeFormer Semantics-Guided Object-Level Active Shape Encoding for 3D Object Detection

Azimuth Super-Resolution for FMCW Radar in Autonomous Driving

A Simple Baseline for Video Restoration With Grouped Spatial-Temporal Shift

A Whac-a-Mole Dilemma Shortcuts Come in Multiples Where Mitigating On

BBDM Image-to-Image Translation With Brownian Bridge Diffusion Models

BioNet A Biologically-Inspired Network for Face Recognition

Boosting Low-Data Instance Segmentation by Unsupervised Pre-Training With Saliency Prompt

Boosting Weakly-Supervised Temporal Action Localization With Text Information

Causally-Aware Intraoperative Imputation for Overall Survival Time Prediction

Center Focusing Network for Real-Time LiDAR Panoptic Segmentation

Class Balanced Adaptive Pseudo Labeling for Federated Semi-Supervised Learning

Compressing Volumetric Radiance Fields to 1 MB

Correlational Image Modeling for Self-Supervised Visual Pre-Training

DANI-Net Uncalibrated Photometric Stereo by Differentiable Shadow Handling Anisotropic Reflectanc

DATE Domain Adaptive Product Seeker for E-Commerc

Decoupled Multimodal Distilling for Emotion Recognition

Deep Random Projector Accelerated Deep Image Prio

Diffusion-SDF Text-To-Shape via Voxelized Diffusion

Discrete Point-Wise Attack Is Not Enough Generalized Manifold Adversarial Attack

Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection

DISC Learning From Noisy Labels via Dynamic Instance-Specific Selection an

DropKey for Vision Transform

DSFNet Dual Space Fusion Network for Occlusion-Robust 3D Dense Fac

DynaMask Dynamic Mask Selection for Instance Segmentation

Dynamic Graph Enhanced Contrastive Learning for Chest X-Ray Report Generation

DynIBaR Neural Dynamic Image-Based Rendering

Edge-Aware Regional Message Passing Controller for Image Forgery Localization

Efficient and Explicit Modelling of Image Hierarchies for Image Restoration

Efficient Multimodal Fusion via Interactive Prompting

Ego-Body Pose Estimation via Ego-Head Pose Estimation

Exploring the Effect of Primitives for Compositional Generalization in Vision-and-Languag

FCC Feature Clusters Compression for Long-Tailed Visual Recognition

Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process

GLIGEN Open-Set Grounded Text-to-Image Generation

Guided Recommendation for Model Fine-Tuning

Hard Sample Matters a Lot in Zero-Shot Quantization

ImageNet-E Benchmarking Neural Network Robustness via Attribute Editing

Improving Vision-and-Language Navigation by Generating Future-View Image Semantics

Inverse Rendering of Translucent Objects Using Physical and Neural Renderers

KERM Knowledge Enhanced Reasoning for Vision-and-Language Navigation

LAVENDER Unifying Video-Language Understanding As Masked Language Modeling

Learning Distortion Invariant Representation for Image Restoration From a Causality

Learning Generative Structure Prior for Blind Text Image Super-Resolution

Learning Steerable Function for Efficient Image Resampling

Learning To Fuse Monocular and Multi-View Cues for Multi-Frame Depth

Less Is More Reducing Task and Model Complexity for 3D

Lift3D Synthesize 3D Training Data by Lifting 2D GAN to

Lite DETR An Interleaved Multi-Scale Encoder for Efficient DET

LOCATE Localize and Transfer Object Parts for Weakly Supervised Affordanc

LoGoNet Towards Accurate 3D Object Detection With Local-to-Global Cross-Modal Fusion

Long Range Pooling for 3D Large-Scale Scene Understanding

MAGE MAsked Generative Encoder To Unify Representation Learning and Imag

Mask DINO Towards a Unified Transformer-Based Framework for Object Detection

MDQE Mining Discriminative Query Embeddings To Segment Occluded Instances on

MEGANE Morphable Eyeglass and Avatar Network

Metadata-Based RAW Reconstruction via Implicit Neural Functions

MobileBrick Building LEGO for 3D Reconstruction on Mobile Devices

MoDAR Using Motion Forecasting for 3D Object Detection in Point

Modeling Inter-Class and Intra-Class Constraints in Novel Class Discovery

MSeg3D Multi-Modal 3D Semantic Segmentation for Autonomous Driving

Multi-View Inverse Rendering for Large-Scale Real-World Indoor Scenes

Neuralangelo High-Fidelity Neural Surface Reconstruction

Neural Video Compression With Diverse Contexts

NIKI Neural Inverse Kinematics With Invertible Neural Networks for 3D

NLOST Non-Line-of-Sight Imaging With Transform

OmniCity Omnipotent City Understanding With Multi-Level and Multi-View Images

One-Shot High-Fidelity Talking-Head Synthesis With Deformable Neural Radiance Fiel

One-to-Few Label Assignment for End-to-End Dense Detection

On the Effectiveness of Partial Variance Reduction in Federated Learning

Open-Set Semantic Segmentation for Point Clouds via Adversarial Prototype Framework

OVTrack Open-Vocabulary Multiple Object Tracking

Patch-Based 3D Natural Scene Generation From a Single Exampl

Photo Pre-Training but for Sketch

Physical-World Optical Adversarial Attacks on 3D Face Recognition

PillarNeXt Rethinking Network Designs for 3D Object Detection in LiDA

Polarized Color Image Denoising

PREIM3D 3D Consistent Precise Image Attribute Editing From a Singl

ProxyFormer Proxy Alignment Assisted Point Cloud Completion With Missing Part

Referring Image Matting

Regularize Implicit Neural Representation by Itsel

Rethinking Feature-Based Knowledge Distillation for Face Recognition

Rethinking Out-of-Distribution OOD Detection Masked Image Modeling Is All You

Robust Model-Based Face Reconstruction Through Weakly-Supervised Outlier Segmentation

Scaling Language-Image Pre-Training via Masking

ScarceNet Animal Pose Estimation With Scarce Annotations

SCConv Spatial and Channel Reconstruction Convolution for Feature Redundancy

SECAD-Net Self-Supervised CAD Reconstruction by Learning Sketch-Extrude Operations

Self-Supervised Blind Motion Deblurring With Deep Expectation Maximization

SGLoc Scene Geometry Encoding for Outdoor LiDAR Localization

SHS-Net Learning Signed Hyper Surfaces for Oriented Normal Estimation o

Sibling-Attack Rethinking Transferable Adversarial Attacks Against Face Recognition

SIM Semantic-Aware Instance Mask Generation for Box-Supervised Instance Segmentation

Source-Free Video Domain Adaptation With Spatial-Temporal-Historical Consistency Learning

Spatial-Then-Temporal Self-Supervised Learning for Video Correspondenc

Spatially Adaptive Self-Supervised Learning for Real-World Image Denoising

Spectral Enhanced Rectangle Transformer for Hyperspectral Image Denoising

SteerNeRF Accelerating NeRF Rendering via Smooth Viewpoint Trajectory

StyleGene Crossover and Mutation of Region-Level Facial Genes for Kinshi

Super-CLEVR A Virtual Benchmark To Diagnose Domain Robustness in Visual

SViTT Temporal Learning of Sparse Video-Text Transformers

Task-Specific Fine-Tuning via Variational Information Bottleneck for Weakly-Supervised Pathology Whol

Token Boosting for Robust Self-Supervised Visual Transformer Pre-Training

ToThePoint Efficient Contrastive Learning of 3D Point Clouds via Recycling

Towards Benchmarking and Assessing Visual Naturalness of Physical World Adversarial

Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting

Trade-Off Between Robustness and Accuracy of Vision Transformers

Uni-Perceiver v2 A Generalist Model for Large-Scale Vision and Vision-Languag

Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation

VoxFormer Sparse Voxel Transformer for Camera-Based 3D Semantic Scene Completion

Weakly Supervised Class-Agnostic Motion Prediction for Autonomous Driving

WINNER Weakly-Supervised hIerarchical decompositioN and aligNment for Spatio-tEmporal Video gRounding

Beyond Attentive Tokens Incorporating Token Importance and Diversity for Efficient

CapDet Unifying Dense Captioning and Open-World Detection Pretraining

NeuralUDF Learning Unsigned Distance Fields for Multi-View Reconstruction of Surfaces

PointClustering Unsupervised Point Cloud Pre-Training Using Transformation Invariance in Clustering

All-in-Focus Imaging From Event Focal Stack

Spatio-Temporal Pixel-Level Contrastive Learning-Based Source-Free Domain Adaptation for Video Semantic

High Fidelity 3D Hand Shape Reconstruction via Scalable Graph Frequency

Camouflaged Instance Segmentation via Explicit De-Camouflaging

Class-Incremental Exemplar Compression for Class-Incremental Learning

Constrained Evolutionary Diffusion Filter for Monocular Endoscope Tracking

GeoLayoutLM Geometric Pre-Training for Visual Information Extraction

GradMA A Gradient-Memory-Based Accelerated Federated Learning With Alleviated Catastrophic Forgetting

Leverage Interactive Affinity for Affordance Learning

MOT Masked Optimal Transport for Partial Domain Adaptation

RaBit Parametric Modeling of 3D Biped Cartoon Characters With

Semantic-Conditional Diffusion Networks for Image Captioning

SIEDOB Semantic Image Editing by Disentangling Object and Backgroun

Towards Generalisable Video Moment Retrieval Visual-Dynamic Injection to Image-Text Pre-Training

Zero-Shot Model Diagnosis

Content-Aware Token Sharing for Efficient Semantic Segmentation With Vision Transformers

Decomposed Soft Prompt Guided Fusion Enhancing for Compositional Zero-Shot Learning

Learning Spatial-Temporal Implicit Neural Representations for Event-Guided Video Super-Resolution

LinK Linear Kernel for LiDAR-Based 3D Perception

Markerless Camera-to-Robot Pose Estimation via Self-Supervised Sim-to-Real Trans

Neuron Structure Modeling for Generalizable Remote Physiological Measurement

Open-Vocabulary Point-Cloud Object Detection Without 3D Annotation

PADA Jointly Sampling Path and Data for Consistent NAS

Robust and Scalable Gaussian Process Regression and Its Applications

Specialist Diffusion Plug-and-Play Sample-Efficient Fine-Tuning of Text-to-Image Diffusion Models To

TransFlow Transformer As Flow Learn

Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection

Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images

Improving Generalization With Domain Convex Gam

Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection

Box-Level Active Detection

Controllable Mesh Generation Through Sparse Latent Point Diffusion Models

Heterogeneous Continual Learning

Tunable Convolutions With Parametric Multi-Loss Optimization

Transfer4D A Framework for Frugal Motion Capture and Deformation Trans

Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss

NIRVANA Neural Implicit Representations of Videos With Adaptive Networks an

DualRel Semi-Supervised Mitochondria Segmentation From a Prototype Perspectiv

Chat2Map Efficient Scene Mapping From Multi-Ego Conversations

Change-Aware Sampling and Contrastive Learning for Satellite Images

Zero-Shot Noise2Noise Efficient Image Denoising Without Any Dat

BEV-Guided Multi-Modality Fusion for Driving Perception

Doubly Right Object Recognition A Why Prompt for Visual Rationales

Leapfrog Diffusion Model for Stochastic Trajectory Prediction

Computational Flash Photography Through Intrinsics

SLACK Stable Learning of Augmentations With Cold-Start and KL Regularization

3D Human Mesh Estimation From Virtual Markers

3D Video Loops From Asynchronous Input

Annealing-Based Label-Transfer Learning for Open World Object Detection

CAT LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object

CREPE Can Vision-Language Foundation Models Reason Compositionally

Curvature-Balanced Feature Manifold Learning for Long-Tailed Classification

DiGeo Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection

Dynamic Aggregated Network for Gait Recognition

OTAvatar One-Shot Talking Face Avatar With Controllable Tri-Plane Rendering

ProD Prompting-To-Disentangle Domain Knowledge for Cross-Domain Few-Shot Image Classification

Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspectiv

Symmetric Shape-Preserving Autoencoder for Unsupervised Real Scene Point Cloud Completion

Towards Better Gradient Consistency for Neural Signed Distance Functions vi

Language-Guided Music Recommendation for Video via Prompt Analogies

Spring A High-Resolution High-Detail Dataset and Benchmark for Scene Flow

Gated Multi-Resolution Transfer Network for Burst Restoration and Enhancement

Deep Polarization Reconstruction With PDAVIS Events

Exploring and Utilizing Pattern Imbalanc

LightPainter Interactive Portrait Relighting With Freehand Scribbl

Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration

PC2 Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction

RealFusion 360deg Reconstruction of Any Object From a Single Imag

Modality-Invariant Visual Odometry for Embodied Vision

Detection Hub Unifying Object Detection Datasets via Query Adaptation on

NeAT Learning Neural Implicit Surfaces With Arbitrary Topologies From Multi-View

On Distillation of Guided Diffusion Models

Data-Driven Feature Tracking for Event Cameras

DivClust Controlling Diversity in Deep Clustering

Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures

Guided Depth Super-Resolution by Deep Anisotropic Diffusion

Progressively Optimized Local Radiance Fields for Robust View Synthesis

Unsupervised Space-Time Network for Temporally-Consistent Segmentation of Multiple Motions

Realistic Saliency Guided Image Enhancement

FedSeg Class-Heterogeneous Federated Learning for Semantic Segmentation

Recurrence Without Recurrence Stable Video Landmark Detection With Deep Equilibrium

Alias-Free Convnets Fractional Shift Invariance via Polynomial Activations

MobileVOS Real-Time Video Object Segmentation Contrastive Learning Meets Knowledge Distillation

Deep Dive Into Gradients Better Optimization for 3D Object Detection

NeurOCS Neural NOCS Supervision for Monocular 3D Object Localization

SPIn-NeRF Multiview Segmentation and Perceptual Inpainting With Neural Radiance Fields

ActMAD Activation Matching To Align Distributions for Test-Time-Training

Ranking Regularization for Critical Rare Classes Minimizing False Positives at

NULL-Text Inversion for Editing Real Images Using Guided Diffusion Models

Learning Action Changes by Measuring Verb-Adverb Textual Relationships

Gazeformer Scalable Effective and Fast Prediction of Goal-Directed Human Attention

Bringing Inputs to Shared Domains for 3D Interacting Hands Recovery

Query-Dependent Video Representation for Moment Retrieval and Highlight Detection

Large-Capacity and Flexible Video Steganography via Invertible Neural Network

Audio-Visual Grouping Network for Sound Localization From Mixtures

Continuous Intermediate Token Learning With Implicit Motion Manifold for Keyfram

Event-Based Shape From Polarization

Learning Correspondence Uncertainty via Differentiable Nonlinear Least Squares

Deep Deterministic Uncertainty A New Simple Baselin

Open Vocabulary Semantic Segmentation With Patch Aligned Contrastive Learning

DiffRF Rendering-Guided 3D Radiance Field Diffusion

Bridging Precision and Confidence A Train-Time Loss for Calibrating Object

EC2 Emergent Communication for Embodied Control

Progressive Backdoor Erasing via Connecting Backdoor and Adversarial Attacks

I2MVFormer Large Language Model Generated Multi-View Document Supervision for Zero-Shot

Tangentially Elongated Gaussian Belief Propagation for Event-Based Incremental Optical Flow

Post-Processing Temporal Action Detection

Unbiased Scene Graph Generation in Videos

3D-POP - An Automated Annotation Approach to Facilitate Markerless 2D-3D

Unite and Conquer Plug Play Multi-Modal Synthesis Using Diffusion

Sparse Multi-Modal Graph Transformer With Shared-Context Processing for Representation Learning

DF-Platter Multi-Face Heterogeneous Deepfake Dataset

ProtoCon Pseudo-Label Refinement via Online Clustering and Prototypical Consistency fo

PIP-Net Patch-Based Intuitive Prototypes for Interpretable Image Classification

DARE-GRAM Unsupervised Domain Adaptation Regression by Aligning Inverse Gram Matrices

ISBNet A 3D Point Cloud Instance Segmentation Network With Instance-Aw

Efficient Scale-Invariant Generator With Column-Row Entangled Pixel Synthesis

Micron-BERT BERT-Based Facial Micro-Expression Recognition

Re-Thinking Model Inversion Attacks Against Deep Neural Networks

TIPI Test Time Adaptation With Transformation Invarianc

Bilateral Memory Consolidation for Continual Learning

Learning 3D Scene Priors With 2D Supervision

HOICLIP Efficient Knowledge Transfer for HOI Detection With Vision-Language Models

Trap Attention Monocular Depth Estimation With Manual Traps

Domain Expansion of Image Generators

Visibility Constrained Wide-Band Illumination Spectrum Design for Seeing-in-the-Dark

Conditional Image-to-Video Generation With Latent Flow Diffusion Models

NUWA-LIP Language-Guided Image Inpainting With Defect-Free VQGAN

PATS Patch Area Transportation With Subdivision for Local Feature Matching

Disentangled Representation Learning for Unsupervised Neural Quantization

Adaptive Global Decay Process for Event Cameras

Temporal Consistent 3D LiDAR Representation Learning for Semantic Perception in

Neural Congealing Aligning Images to a Joint Semantic Atlas

AssemblyHands Towards Egocentric Activity Understanding via 3D Hand Pose Estimation

BlackVIP Black-Box Visual Prompting for Robust Transfer Learning

Recovering 3D Hand Mesh Sequence From a Single Blurry Imag

Towards Universal Fake Image Detectors That Generalize Across Generative Models

Towards Building Self-Aware Object Detectors via Reliable Uncertainty Quantification an

Detection of Out-of-Distribution Samples Using Binary Neuron Activation Patterns

Cross-GAN Auditing Unsupervised Identification of Attribute Level Similarities and Differences

Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation

DyNCA Real-Time Dynamic Texture Synthesis Using Neural Cellular Automat

B-Spline Texture Coefficients Estimator for Screen Content Image Super-Resolution

Implicit View-Time Interpolation of Stereo Videos Using Multi-Plane Disparities an

Visual Localization Using Imperfect 3D Models From the Internet

Backdoor Cleansing With Unlabeled Dat

DPE Disentanglement of Pose and Expression for General Video Portrait

Standing Between Past and Future Spatio-Temporal Modeling for Multi-Camera 3D

Unsupervised 3D Point Cloud Representation Learning by Triangle Constrained Contrast

BAEFormer Bi-Directional and Early Interaction Transformers for Birds Eye View

Boundary-Aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval

Cloud-Device Collaborative Adaptation to Continual Changing Environments in the Real-Worl

Deep Discriminative Spatial and Temporal Network for Efficient Video Deblurring

Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning Network

Slide-Transformer Hierarchical Vision Transformer With Local Self-Attention

Stitchable Neural Networks

Towards Open-World Segmentation of Parts

Learning To Name Classes for Vision and Language Models

All-in-One Image Restoration for Unknown Degradations Using Adaptive Discriminative Filters

BiFormer Learning Bilateral Motion Estimation via Bilateral Transformer for 4K

Dual-Path Adaptation From Image to Video Transformers

LANIT Language-Driven Image-to-Image Translation for Unlabeled Dat

Mask-Guided Matting in the Wil

Multi-Modal Representation Learning With Text-Driven Soft Masks

Perception-Oriented Single Image Super-Resolution Using Optimal Objective Estimation

RGB No More Minimally-Decoded JPEG Vision Transformers

Self-Positioning Point-Based Transformer for Point Cloud Understanding

Temporal Interpolation Is All You Need for Dynamic Neural Radianc

Training Debiased Subnetworks With Contrastive Weight Pruning

ViPLO Vision Transformer Based Pose-Conditioned Self-Loop Graph for Human-Object Interaction

Learning To Retain While Acquiring Combating Distribution-Shift in Adversarial Data-F

Sequential Training of GANs Against GAN-Classifiers Reveals Correlated Knowledge Gaps

Multiclass Confidence and Localization Calibration for Object Detection

DeepLSD Line Segment Detection and Refinement With Deep Image Gradients

Shape Pose and Appearance From a Single Image via Bootst

Megahertz Light Steering Without Moving Parts

StyleRes Transforming the Residuals for Real Image Editing With StyleGAN

CLIPPING Distilling CLIP-Based Models With a Student Base for Video-Languag

Re-Basin via Implicit Sinkhorn Differentiation

Hierarchical Dense Correlation Distillation for Few-Shot Segmentation

On the Convergence of IRLS and Its Variants in Outlier-Robust

OpenScene 3D Scene Understanding With Open Vocabularies

Perception and Semantic Aware Regularization for Sequential Confidence Calibration

Representing Volumetric Videos As Dynamic MLP Maps

Trajectory-Aware Body Interaction Transformer for Multi-Person Pose Forecasting

Use Your Head Improving Long-Tail Video Recognition

pCON Polarimetric Coordinate Networks for Neural Scene Representations

Object Pop-Up Can We Infer 3D Objects and Their Poses

HyperCUT Video Sequence From a Single Blurry Image Using Unsupervis

Wavelet Diffusion Models Are Fast and Scalable Image Generators

iDisc Internal Discretization for Monocular Depth Estimation

Rethinking Video ViTs Sparse Video Tubes for Joint Image an

SegLoc Learning Segmentation-Based Representations for Privacy-Preserving Visual Localization

Handwritten Text Generation From Visual Archetypes

DynaFed Tackling Client Data Heterogeneity With Global Dynamics

Frame Interpolation Transformer and Uncertainty Guidanc

GlassesGAN Eyewear Personalization Using Synthetic Appearance Discovery and Targeted Subspac

Robust Unsupervised StyleGAN Image Restoration

Handy Towards a High Fidelity 3D Hand Shape and Appearanc

Enhancing Deformable Local Features by Jointly Learning To Detect an

Computationally Budgeted Continual Learning What Does Matt

DINER Depth-Aware Image-Based NEural Radiance Fields

Dynamic Conceptional Contrastive Learning for Generalized Category Discovery

Adaptive Data-Free Quantization

End-to-End Vectorized HD-Map Construction With Piecewise Bezier Curv

Fuzzy Positive Learning for Semi-Supervised Semantic Segmentation

Bi-Level Meta-Learning for Few-Shot Domain Generalization

Class-Balancing Diffusion Models

Deep Graph-Based Spatial Consistency for Robust Non-Rigid Point Cloud Registration

FreeSeg Unified Universal and Open-Vocabulary Image Segmentation

Ground-Truth Free Meta-Learning for Deep Compressive Sampling

Learning To Exploit the Sequence-Specific Prior Knowledge for Image Processing

MotionTrack Learning Robust Short-Term and Long-Term Motions for Multi-Object Tracking

Reliable and Interpretable Personalized Federated Learning

Robust 3D Shape Classification via Non-Local Graph Attention Network

CafeBoost Causal Feature Boost To Eliminate Task-Induced Bias for Class

Graph Representation for Order-Aware Visual Transformation

Looking Through the Glass Neural Surface Reconstruction Against High Specul

PSVT End-to-End Multi-Person 3D Pose and Shape Estimation With Progressiv

REC-MV REconstructing 3D Dynamic Cloth From Monocular Videos

Diverse 3D Hand Gesture Prediction From Body Dynamics by Bilateral

Motion Information Propagation for Neural Video Compression

Real-Time 6K Image Rescaling With Rate-Distortion Optimization

Bias Mimicking A Simple Sampling Approach for Bias Mitigation

Neumann Network With Recursive Kernels for Single Image Defocus Deblurring

A Characteristic Function-Based Method for Bottom-Up Human Pose Estimation

How To Prevent the Poor Performance Clients for Personalized Federat

Learning To Segment Every Referring Object Point by Point

Modality-Agnostic Debiasing for Single Domain Generalization

SketchXAI A First Look at Explainability for Human Sketches

Towards Robust Tampered Text Detection in Document Image New Dataset

Upcycling Models Under Domain and Category Shift

MoDi Unconditional Motion Synthesis From Diverse Dat

Filtering Distillation and Hard Negatives for Vision-Language Pre-Training

Ambiguous Medical Image Segmentation Using Diffusion Models

Learning Partial Correlation Based Deep Visual Representation for Image Classification

Make-a-Story Visual Memory Conditioned Consistent Story Generation

Infinite Photorealistic Worlds Using Procedural Generation

On the Benefits of 3D Pose and Tracking for Human

NaQ Leveraging Narrations As Queries To Supervise Episodic Memory

PACO Parts and Attributes of Common Objects

Overlooked Factors in Concept-Based Explanations Dataset Choice Concept Learnability an

SmallCap Lightweight Image Captioning Prompted With Retrieval Augmentation

PIRLNav Pretraining With Imitation and RL Finetuning for ObjectNav

Visual DNA Representing and Comparing Images Using Distributions of Neuron

Hybrid Active Learning via Deep Clustering for Video Action Detection

NoisyTwins Class-Consistent and Diverse Image Generation Through StyleGANs

FaceLit Neural 3D Relightable Faces

Masked Representation Learning for Domain Generalized Stereo Matching

TranSG Transformer-Based Skeleton Graph Prototype Contrastive Learning With Structure-Trajectory Prompt

Fine-Tuned CLIP Models Are Efficient Video Learners

Synthesizing Photorealistic Virtual Humans Through Cross-Modal Disentanglement

Understanding Deep Generative Models With Generalized Empirical Likelihoods

Decoupled Semantic Prototypes Enable Learning From Diverse Annotation Types fo

Trace and Pace Controllable Pedestrian Animation via Guided Trajectory Diffusion

Autonomous Manipulation Learning for Similar Deformable Objects via Only On

Crossing the Gap Domain Generalization for Image Captioning

Defining and Quantifying the Emergence of Sparse Concepts in DNNs

Focus on Details Online Multi-Object Tracking With Diverse Fine-Grained Representation

Improving Fairness in Facial Albedo Estimation via Visual-Textual Cues

Masked Jigsaw Puzzle A Versatile Position Embedding for Vision Transformers

Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization

TinyMIM An Empirical Study of Distilling MIM Pre-Trained Models

VolRecon Volume Rendering of Signed Ray Distance Functions for Generalizabl

CoralStyleCLIP Co-Optimized Region and Layer Selection for Image Editing

Masked Wavelet Representation for Compact Neural Radiance Fields

NeRFLight Fast and Light Neural Radiance Fields Using a Sh

PivoTAL Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization

Novel Class Discovery for 3D Point Cloud Semantic Segmentation

UMat Uncertainty-Aware Single Image High Resolution Material Captu

Conjugate Product Graphs for Globally Optimal 2D-3D Shape Matching

Boundary-Enhanced Co-Training for Weakly Supervised Semantic Segmentation

Proximal Splitting Adversarial Attack for Semantic Segmentation

PermutoSDF Fast Multi-View Reconstruction With Implicit Surfaces Using Permutohedral Lattices

FJMP Factorized Joint Multi-Agent Motion Prediction Over Learned Directed Acyclic

MM-Diffusion Learning Multi-Modal Diffusion Models for Joint Audio and Video

EventNeRF Neural Radiance Fields From a Single Colour Event Cam

BITE Beyond Priors for Improved Three-D Dog Pose Estimation

DreamBooth Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

GazeNeRF 3D-Aware Gaze Redirection With Neural Radiance Fields

Token Contrast for Weakly-Supervised Semantic Segmentation

Egocentric Auditory Attention Localization in Conversations

Token Turing Machines

Instant Domain Augmentation for LiDAR Semantic Segmentation

OCELOT Overlapped Cell on Tissue Dataset for Histopathology

RobustNeRF Ignoring Distractors With Robust Losses

CUDA Convolution-Based Unlearnable Datasets

Re-IQA Unsupervised Learning for Image Quality Assessment in the Wil

CLIP for All Things Zero-Shot Sketch-Based Image Retrieval Fine-Grained o

Exploiting Unlabelled Photos for Stronger Fine-Grained SBI

Pic2Word Mapping Pictures to Words for Zero-Shot Composed Image Retrieval

Prefix Conditioning Unifies Language and Label Supervision

RUST Latent Neural Scene Representations From Unposed Imagery

CLIP-Sculptor Zero-Shot Generation of High-Fidelity and Diverse Shapes From Natural

Structured Kernel Estimation for Photon-Limited Deconvolution

WIRE Wavelet Implicit Neural Representations

Simulated Annealing in Early Layers Leads to Better Generalization

Fake It Till You Make It Learning Transferable Representations From

Parameter Efficient Local Implicit Image Function Network for Face Segmentation

OrienterNet Visual Localization in 2D Public Maps With Neural Matching

Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation

Prompt-Guided Zero-Shot Anomaly Action Recognition Using Pretrained Deep Skeleton Features

Unsupervised Intrinsic Image Decomposition With LiDAR Intensity

OReX Object Reconstruction From Planar Cross-Sections Using Neural Fields

Re-GAN Data-Efficient GANs Training via Architectural Reconfiguration

A Large-Scale Robustness Analysis of Video Action Recognition Models

Safe Latent Diffusion Mitigating Inappropriate Degeneration in Diffusion Models

Simple Cues Lead to a Strong Multi-Object Track

HuManiFlow Ancestor-Conditioned Normalising Flows on SO3 Manifolds for Human Pos

Independent Component Alignment for Multi-Task Learning

Leveraging Hidden Positives for Unsupervised Semantic Segmentation

AVFormer Injecting Vision Into Frozen Speech Models for Zero-Shot AV-AS

MixNeRF Modeling a Ray With Mixture Density for Novel View

DeAR Debiasing Vision-Language Models With Additive Residuals

HouseDiffusion Vector Floorplan Generation via a Diffusion Model With Discret

Similarity Maps for Self-Training Weakly-Supervised Phrase Grounding

HaLP Hallucinating Latent Positives for Skeleton-Based Self-Supervised Learning of Actions

CLIP2Protect Protecting Facial Privacy Using Text-Guided Makeup via Adversarial Latent

Evading Forensic Classifiers With Attribute-Conditioned Adversarial Faces

Incrementer Transformer for Class-Incremental Semantic Segmentation With Knowledge Distillation Focusing

Joint Video Multi-Frame Interpolation and Deblurring Under Unknown Exposure Tim

Post-Training Quantization on Diffusion Models

Detecting and Grounding Multi-Modal Media Manipulation

Prompting Large Language Models With Answer Heuristics for Knowledge-Based Visual

ReasonNet End-to-End Driving With Temporal and Global Reasoning

Tensor4D Efficient Neural 4D Decomposition for High-Fidelity Dynamic Reconstruction an

Cant Steal Cont-Steal Contrastive Stealing Attacks Against Image Encoders

PixHt-Lab Pixel Height Based Light Effect Generation for Image Compositing

Structure Aggregation for Cross-Spectral Stereo Image Guided Denoising

DeepMAD Mathematical Architecture Design for Deep Convolutional Neural Network

DiffTalk Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation

DiGA Distil To Generalize and Then Adapt for Domain Adaptiv

Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation With

Equiangular Basis Vectors

Fine-Grained Audible Video Description

GINA-3D Learning To Generate Implicit Neural Assets in the Wil

Global-to-Local Modeling for Video-Based 3D Human Pose and Shape Estimation

Learning Human Mesh Recovery in 3D Scenes

LidarGait Benchmarking 3D Gait Recognition With Point Clouds

MoStGAN-V Video Generation With Temporal Motion Styles

PointCMP Contrastive Mask Prediction for Self-Supervised Learning on Point Clou

Progressive Transformation Learning for Leveraging Virtual Images in Training

Self-Supervised 3D Scene Flow Estimation Guided by Superpoints

StructVPR Distill Structural Knowledge With Weighting Samples for Visual Plac

X-Avatar Expressive Human Avatars

PLIKS A Pseudo-Linear Inverse Kinematic Solver for 3D Human Body

Listening Human Behavior 3D Human Pose Estimation With Acoustic Signals

Learning Decorrelated Representations Efficiently Using Fast Fourier Transform

Diffusion-Based Signed Distance Fields for 3D Shape Generation

Deep Depth Estimation From Thermal Imag

Local Connectivity-Based Density Estimation for Face Clustering

NIPQ Noise Proxy-Based Integrated Pseudo-Quantization

SDC-UDA Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality

FlowFormer Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation

Learning 3D-Aware Image Synthesis With Unknown Pose Distribution

Make Landscape Flatter in Differentially Private Federated Learning

Matching Is Not Enough A Two-Stage Framework for Category-Agnostic Pos

Top-Down Visual Attention From Analysis by Synthesis

Transformer Scale Gate for Semantic Segmentation

TriDet Temporal Action Detection With Relative Boundary Modeling

GraVoS Voxel Selection for 3D Point-Cloud Detection

3D Neural Field Generation Using Triplane Diffusion

Learning Common Rationale To Improve Self-Supervised Representation for Fine-Grained Visual

Unsupervised Volumetric Animation

Panoptic Lifting for 3D Scene Understanding With Neural Fields

Adaptive Annealing for Robust Geometric Estimation

Unsupervised Object Localization Observing the Background To Discover Objects

Depth Estimation From Camera Image and mmWave Radar Point Clou

EVAL Explainable Video Anomaly Localization

High-Fidelity Guided Image Synthesis With Latent Diffusion Models

Multi Domain Learning for Motion Magnification

Polynomial Implicit Neural Representations for Large Diverse Datasets

Common Pets in 3D Dynamic New-View Synthesis of Real-Life Deformabl

SparsePose Sparse-View Camera Pose Regression and Refinement

Angelic Patches for Improving Third-Party Object Detector Performanc

Fully Self-Supervised Depth Estimation From Defocus Clu

CODA-Prompt COntinual Decomposed Attention-Based Prompting for Rehearsal-Free Continual Learning

ConStruct-VL Data-Free Continual Structured VL Concepts Learning

Visual Prompt Tuning for Generative Transfer Learning

Integral Neural Networks

Role of Transients in Two-Bounce Non-Line-of-Sight Imaging

Diffusion Art or Digital Forgery Investigating Data Replication in Diffusion

Advancing Visual Grounding With Scene Knowledge Benchmark and Metho

DIFu Depth-Guided Implicit Function for Clothed Human Reconstruction

EcoTTA Memory-Efficient Continual Test-Time Adaptation via Self-Distilled Regularization

Efficient Hierarchical Entropy Model for Learned Point Cloud Compression

Learning With Fantasy Semantic-Aware Virtual Contrastive Constraint for Few-Shot Class-Incremental

Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning

ObjectStitch Object Compositing With Diffusion Model

OPE-SR Orthogonal Position Encoding for Designing a Parameter-Free Upsampling Modul

Optimal Proposal Learning for Deployable End-to-End Pedestrian Detection

Optimization-Inspired Cross-Attention Transformer for Compressive Sensing

Robust Single Image Reflection Removal Against Adversarial Attacks

Unsupervised Deep Asymmetric Stereo Matching With Spatially-Adaptive Self-Similarity

SinGRAF Learning a 3D Generative Radiance Field for a Singl

MarginMatch Improving Semi-Supervised Learning with Pseudo-Margins

Non-Contrastive Unsupervised Learning of Physiological Signals From Video

Unicode Analogies An Anti-Objectivist Visual Reasoning Challeng

How You Feelin Learning Emotions and Mental States in Movi

Learning Articulated Shape With Keypoint Pseudo-Labels From Web Images

CrOC Cross-View Online Clustering for Dense Visual Representation Learning

The Wisdom of Crowds Temporal Progressive Attention for Early Action

BASiS Batch Aligned Spectral Embedding Spac

Omnimatte3D Associating Objects and Their Effects in Unconstrained Monocular Video

ScanDMM A Deep Markov Model of Scanpath Prediction for 360deg

Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment

Bi-Directional Feature Fusion Generative Adversarial Network for Ultra-High Resolution Pathological

BKinD-3D Self-Supervised 3D Keypoint Discovery From Multi-View Videos

Co-Speech Gesture Synthesis by Reinforcement Learning With Contrastive Pre-Trained Rewards

Consistent Direct Time-of-Flight Video Depth Super-Resolution

Correspondence Transformers With Asymmetric Feature Learning and Matching Flow Super-Resolution

Decoupling Learning and Remembering A Bilevel Memory Framework With Knowledg

DeFeeNet Consecutive 3D Human Motion Prediction With Deviation Feedback

Event-Based Frame Interpolation With Ad-Hoc Deblurring

Hierarchical Semantic Contrast for Scene-Aware Video Anomaly Detection

Indiscernible Object Counting in Underwater Scenes

Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning

Learning Semantic-Aware Disentangled Representation for Flexible 3D Human Body Editing

Masked Motion Encoding for Self-Supervised Video Representation Learning

MISC210K A Large-Scale Dataset for Multi-Instance Semantic Correspondenc

MOSO Decomposing MOtion Scene and Object for Video Prediction

Next3D Generative Neural Texture Rasterization for 3D-Aware Head Avatars

Pose Synchronization Under Multiple Pair-Wise Relative Poses

RefTeacher A Strong Baseline for Semi-Supervised Referring Expression Comprehension

Regularizing Second-Order Influences for Continual Learning

Rethinking Domain Generalization for Face Anti-Spoofing Separability and Alignment

Single Image Backdoor Inversion via Robust Smoothed Classifiers

Stimulus Verification Is a Universal and Effective Sampler in Multi-Modal

TRACE 5D Temporal Regression of Avatars With Dynamic Cameras in

Ultrahigh Resolution ImageVideo Matting With Spatio-Temporal Sparsity

MixSim A Hierarchical Framework for Mixed Reality Traffic Simulation

S3C Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning

Language Adaptive Weight Generation for Multi-Task Visual Grounding

Physics-Driven Diffusion Models for Impact Sound Synthesis From Videos

Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information

High-Resolution Image Reconstruction With Latent Diffusion Models From Human Brain

Breaching FedMD Image Recovery via Paired-Logits Inversion Attack

Visual Atoms Pre-Training Vision Transformers With Sinusoidal Waves

Efficient View Synthesis and 3D-Based Multi-Frame Denoising With Multiplane Featu

3D Human Pose Estimation With Spatio-Temporal Criss-Cross Attention

ABLE-NeRF Attention-Based Rendering With Learnable Embeddings for Neural Radiance Fiel

A New Benchmark On the Utility of Synthetic Data With

Contrastive Grouping With Transformer for Referring Image Segmentation

DETR With Additional Global Aggregation for Cross-Domain Weakly Supervised Object

Fair Scratch Tickets Finding Fair Sparse Networks Without Weight Training

FLAG3D A 3D Fitness Activity Dataset With Language Instruction

Graph Transformer GANs for Graph-Constrained House Generation

HumanBench Towards General Human-Centric Perception With Projector Assisted Pretraining

Intrinsic Physical Concepts Discovery With Object-Centric Predictive Models

Label Information Bottleneck for Label Enhancement

Master Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic

NeuMap Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization

Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation

Parts2Words Learning Joint Embedding of Point Clouds and Texts by

Uncertainty-Aware Unsupervised Image Deblurring With Deep Residual Prio

Unifying Vision Text and Layout for Universal Document Processing

Visual Recognition by Request

Weakly Supervised Posture Mining for Fine-Grained Classification

What Happened 3 Seconds Ago Inferring the Past With Thermal

You Need Multiple Exiting Dynamic Early Exiting for Accelerating Unifi

Interactive and Explainable Region-Guided Radiology Report Generation

Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigg

Distilling Neural Fields for Real-Time Articulated Shape Reconstruction

Hierarchical Semantic Correspondence Networks for Video Paragraph Grounding

Language-Guided Audio-Visual Source Separation via Trimodal Consistency

Learning on Gradients Generalized Artifacts Representation for GAN-Generated Images Detection

Sample-Level Multi-View Graph Clustering

SMOC-Net Leveraging Camera Pose for Self-Supervised Monocular Object Pose Estimation

Temporal Attention Unit Towards Efficient Spatiotemporal Predictive Learning

Boosting Transductive Few-Shot Fine-Tuning With Margin-Based Uncertainty Weighting and Probability

GALIP Generative Adversarial CLIPs for Text-to-Image Synthesis

Siamese Image Modeling for Self-Supervised Vision Representation Learning

Weakly Supervised Monocular 3D Object Detection Using Multi-View Projection an

ViTs for SITS Vision Transformers for Satellite Image Time Series

Jedi Entropy-Based Localization and Removal of Adversarial Patches

Logical Implications for Visual Question Answering Consistency

CaPriDe Learning Confidential and Private Decentralized Learning Based on Encryption-Friendly

Defending Against Patch-Based Backdoor Attacks on Self-Supervised Learning

Full or Weak Annotations An Adaptive Strategy for Budget-Constrained Annotation

FLEX Full-Body Grasping Without Full-Body Grasps

Generating Part-Aware Editable 3D Shapes Without 3D Supervision

Learning To Zoom and Unzoom

CABM Content-Aware Bit Mapping for Single Image Super-Resolution Network With

GeoMAE Masked Geometric Target Prediction for Self-Supervised Point Cloud Pre-Training

GradICON Approximate Diffeomorphisms via Gradient Inverse Consistency

Integrally Pre-Trained Transformer Pyramid Networks

Manipulating Transfer Learning for Property Inferenc

Modeling the Distributional Uncertainty for Salient Object Detection Models

Multi-Object Manipulation via Object-Centric Neural Scattering Functions

ResFormer Scaling ViTs With Multi-Resolution Training

Robot Structure Prior Guided Temporal Attention for Camera-to-Robot Pose Estimation

Trainable Projected Gradient Method for Robust Fine-Tuning

Revisiting Reverse Distillation for Anomaly Detection

Energy-Efficient Adaptive 3D Sensing

ORCa Glossy Objects As Radiance-Field Cameras

Breaking the Object in Video Object Segmentation

TeSLA Test-Time Self-Learning With Automatic Adversarial Augmentation

Seeing Through the Glass Neural 3D Reconstruction of Object Insi

ReLight My NeRF A Dataset for Novel View Synthesis an

NeRF-Supervised Deep Stereo

Co-Training 2L Submodels for Visual Recognition

3D Human Pose Estimation via Intuitive Physics

Edges to Shapes to Concepts Adversarial Augmentation for Robust Vision

Hubs and Hyperspheres Reducing Hubness and Improving Transductive Few-Shot Learning

On the Effects of Self-Supervision and Contrastive Alignment in D

FREDOM Fairness Domain Adaptation Approach to Semantic Scene Understanding

SPARF Neural Radiance Fields From Sparse and Noisy Poses

CLIPPO Image-and-Language Understanding From Pixels Only

Consistent View Synthesis With Pose-Guided Diffusion Models

EDGE Editable Dance Generation From Music

Improving Visual Representation Learning Through Perceptual Understanding

Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation

SUDS Scalable Urban Dynamic Scenes

A Bag-of-Prototypes Representation for Dataset-Level Applications

Learning From Noisy Labels With Decoupled Meta Label Purifi

Learning With Noisy Labels via Self-Supervised Adversarial Noisy Masking

Toward Accurate Post-Training Quantization for Image Super Resolution

Visual Query Tuning Towards Effective Usage of Intermediate Representations fo

DeGPR Deep Guided Posterior Regularization for Multi-Class Cell Detection an

Learning Situation Hyper-Graphs for Video Question Answering

SCADE NeRFs from Space Carving With Ambiguity-Aware Depth Estimates

Dynamic Inference With Grounding Based Vision and Language Models

Patch-Craft Self-Supervised Training for Correlated Image Denoising

Amsterdam ASPnet Action Segmentation With Shared-Private Representation of Multiple Data Sources

Hoorick Tracking Through Containers and Occluders in the Wil

CUF Continuous Upsampling Filters

MobileOne An Improved One Millisecond Mobile Backbon

GeneCIS A Benchmark for General Conditional Image Similarity

Test Time Adaptation With Regularized Loss for Weakly Supervised Salient

JRDB-Pose A Large-Scale Dataset for Multi-Person Pose Estimation and Tracking

CLIP the Gap A Single Domain Generalization Approach for Object

Learning Transformations To Reduce the Geometric Shift in Object Detection

PIVOT Prompting for Video Continual Learning

Connecting Vision and Language With Video Localized Narratives

Multi-Sensor Large-Scale Dataset for Multi-View 3D Reconstruction

A-Cap Anticipation Captioning With Commonsense Knowledg

Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection

Mask-Free OVIS Open-Vocabulary Instance Segmentation Without Manual Mask Annotations

EDICT Exact Diffusion Inversion via Coupled Transformations

Teaching Matters Investigating the Role of Supervision in Vision Transformers

Gated Stereo Joint Depth Estimation From Gated and Wide-Baseline Activ

3Mformer Multi-Order Multi-Mode Transformer for Skeletal Action Recognition

Accelerating Vision-Language Pretraining With Free Language Modeling

Adapting Shortcut With Normalizing Flow An Efficient Tuning Framework fo

Adaptive Patch Deformation for Textureless-Resilient Multi-View Stereo

All in One Exploring Unified Video-Language Pre-Training

AltFreezing for More General Video Face Forgery Detection

ALTO Alternating Latent Topologies for Implicit 3D Reconstruction

Are We Ready for Vision-Centric Driving Streaming Perception The ASA

ARO-Net Learning Implicit Fields From Anchored Radial Observations

AttriCLIP A Non-Incremental Learner for Incremental Knowledge Learning

AutoRecon Automated 3D Object Discovery and Reconstruction

A Practical Stereo Depth System for Smart Glasses

A Practical Upper Bound for the Worst-Case Attribution Deviations

BAD-NeRF Bundle Adjusted Deblur Neural Radiance Fields

Balancing Logit Variation for Long-Tailed Semantic Segmentation

BEV-LaneDet An Efficient 3D Lane Detection Based on Virtual Cam

Bi-Directional Distribution Alignment for Transductive Zero-Shot Learning

Bi-LRFusion Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection

Binary Latent Diffusion

CF-Font Content Fusion for Few-Shot Font Generation

Clothed Human Performance Capture With a Double-Layer Neural Radiance Fields

Co-SLAM Joint Coordinate and Sparse Parametric Encodings for Neural Real-Tim

Compacting Binary Neural Networks by Sparse Kernel Selection

Complete 3D Human Reconstruction From a Single Incomplete Imag

Compression-Aware Video Super-Resolution

Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation

Consistent-Teacher Towards Reducing Inconsistent Pseudo-Targets in Semi-Supervised Object Detection

Context-Aware Pretraining for Efficient Blind Image Decomposition

Cooperation or Competition Avoiding Player Domination for Multi-Target Robustness vi

Cut and Learn for Unsupervised Object Detection and Instance Segmentation

DaFKD Domain-Aware Federated Knowledge Distillation

Decoupling-and-Aggregating for Image Exposure Correction

DeepVecFont-v2 Exploiting Transformers To Synthesize Vector Fonts With Higher Quality

Deep Arbitrary-Scale Image Super-Resolution via Scale-Equivariance Pursuit

Deep Factorized Metric Learning

Deep Hashing With Minimal-Distance-Separated Hash Centers

Deep Learning of Partial Graph Matching via Differentiable Top-K

Detecting Everything in the Open World Towards Universal Object Detection

Dionysus Recovering Scene Structures by Dividing Into Semantic Pieces

DR2 Diffusion-Based Robust Degradation Remover for Blind Face Restoration

DSVT Dynamic Sparse Voxel Transformer With Rotated Sets

Dynamically Instance-Guided Adaptation A Backward-Free Approach for Test-Time Domain Adaptiv

Dynamic Graph Learning With Content-Guided Spatial-Frequency Relation Reasoning for Deepfak

EfficientSCI Densely Connected Network With Space-Time Factorization for Large-Scale Video

F2-NeRF Fast Neural Radiance Field Training With Free Camera Trajectories

FeatureBooster Boosting Feature Descriptors With a Lightweight Neural Network

Feature Alignment and Uniformity for Test Time Adaptation

FEND A Future Enhanced Distribution-Aware Contrastive Learning Framework for Long-Tail

Few-Shot Learning With Visual Distribution Calibration and Cross-Modal Distribution Alignment

Flow Supervision for Deformable NeRF

FrustumFormer Adaptive Instance-Aware Resampling for Multi-View 3D Detection

Generalist Decoupling Natural and Robust Generalization

Generalized UAV Object Detection via Frequency Domain Disentanglement

Glocal Energy-Based Learning for Few-Shot Open-Set Recognition

Gradient-Based Uncertainty Attribution for Explainable Bayesian Deep Learning

Hard Patches Mining for Masked Image Modeling

Hunting Sparsity Density-Guided Contrastive Learning for Semi-Supervised Semantic Segmentation

HypLiLoc Towards Effective LiDAR Pose Regression With Hyperbolic Fusion

Imagen Editor and EditBench Advancing and Evaluating Text-Guided Image Inpainting

Images Speak in Images A Generalist Painter for In-Context Visual

Image as a Foreign Language BEiT Pretraining for Vision an

Image Cropping With Spatial-Aware Feature and Rank Consistency

Improving Generalization of Meta-Learning With Inverted Regularization at Inner-Level

Improving Robust Generalization by Direct PAC-Bayesian Bound Minimization

InternImage Exploring Large-Scale Vision Foundation Models With Deformable Convolutions

JAWS Just a Wild Shot for Cinematic Transfer in Neural

LANA A Language-Capable Navigator for Instruction Following and Generation

Learning Bottleneck Concepts in Image Classification

Learning Conditional Attributes for Compositional Zero-Shot Learning

Learning To Detect and Segment for Open Vocabulary Object Detection

Learning Transformation-Predictive Representations for Detection and Description of Local Features

LG-BPN Local and Global Blind-Patch Network for Self-Supervised Real-World Denoising

LiDAR2Map In Defense of LiDAR-Based Semantic Map Construction Using Onlin

LipFormer High-Fidelity and Generalizable Talking Face Generation With a Pre-Learn

Look Before You Match Instance Understanding Matters in Video Object

LP-DIF Learning Local Pattern-Specific Deep Implicit Function for 3D Objects

Masked Image Modeling With Local Multi-Scale Reconstruction

Masked Video Distillation Rethinking Masked Feature Modeling for Self-Supervised Video

MCF Mutual Correction Framework for Semi-Supervised Medical Image Segmentation

MDL-NAS A Joint Multi-Domain Learning Framework for Vision Transform

MeMaHand Exploiting Mesh-Mano Interaction for Single Image Two-Hand Reconstruction

MetaMix Towards Corruption-Robust Continual Learning With Temporally Self-Adaptive Data Transformation

MetaViewer Towards a Unified Multi-View Representation

METransformer Radiology Report Generation by Transformer With Multiple Learnable Expert

MHPL Minimum Happy Points Learning for Active Source Free Domain

Model Barrier A Compact Un-Transferable Isolation Domain for Model Intellectual

MoLo Motion-Augmented Long-Short Contrastive Learning for Few-Shot Action Recognition

Multi-Agent Automated Machine Learning

Multi-Modal Learning With Missing Modality via Shared-Specific Feature Modelling

Multilateral Semantic Relations Modeling for Image Text Retrieval

Multimodal Industrial Anomaly Detection via Hybrid Fusion

NeMo Learning 3D Neural Motion Fields From Multiple Video Instances

Neural Fields Meet Explicit Geometric Representations for Inverse Rendering o

Neural Koopman Pooling Control-Inspired Temporal Dynamics Encoding for Skeleton-Based Action

Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos

NeuWigs A Neural Dynamic Model for Volumetric Hair Capture an

Non-Line-of-Sight Imaging With Signal Superresolution Network

Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection

Omni Aggregation Networks for Lightweight Image Super-Resolution

On Calibrating Semantic Segmentation Models Analyses and an Algorithm

On the Pitfall of Mixup for Uncertainty Calibration

Open-Set Fine-Grained Retrieval via Prompting Vision-Language Evaluato

Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning

PDPPProjected Diffusion for Procedure Planning in Instructional Videos

PET-NeuS Positional Encoding Tri-Planes for Neural Surfaces

Pixels Regions and Objects Multiple Enhancement for Salient Object Detection

PlaneDepth Self-Supervised Depth Estimation via Orthogonal Planes

Position-Guided Text Prompt for Vision-Language Pre-Training

Practical Network Acceleration With Tiny Sets

Privacy-Preserving Adversarial Facial Features

Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis

Propagate and Calibrate Real-Time Passive Non-Line-of-Sight Tracking

ProphNet Efficient Agent-Centric Motion Forecasting With Anchor-Informed Proposals

ProTeGe Untrimmed Pretraining for Video Temporal Grounding by Video Temporal

PyPose A Library for Robot Learning With Physics-Based Optimization

Raw Image Reconstruction With Learned Compact Metadat

Rethinking the Correlation in Few-Shot Segmentation A Buoys View

Rethinking the Learning Paradigm for Dynamic Facial Expression Recognition

RIFormer Keep Your Vision Backbone Effective but Removing Token Mix

Robust Multiview Point Cloud Registration With Reliable Pose Graph Initialization

RODIN A Generative Model for Sculpting 3D Digital Avatars Using

Scene-Aware Egocentric 3D Human Pose Estimation

Score Jacobian Chaining Lifting Pretrained 2D Diffusion Models for 3D

Seeing What You Said Talking Face Generation Guided by

Selective Structured State-Spaces for Long-Form Video Understanding

Semantic Scene Completion With Cleaner Sel

Semi-Supervised Parametric Real-World Image Harmonization

Sharpness-Aware Gradient Matching for Domain Generalization

SmartAssign Learning a Smart Knowledge Assignment Strategy for Deraining an

Spatial-Frequency Mutual Learning for Face Super-Resolution

SunStage Portrait Reconstruction and Relighting Using the Sun as

Task Difficulty Aware Parameter Allocation Regularization for Lifelong Learning

Towards Domain Generalization for Multi-View 3D Object Detection in Bird-Eye-View

Towards Professional Level Crowd Annotation of Expert Domain Dat

Towards Transferable Targeted Adversarial Examples

Turning Strengths Into Weaknesses A Certified Robustness Inspired Attack Framework

Two-Stream Networks for Weakly-Supervised Temporal Action Localization With Semantic-Aware Mechanisms

VideoMAE V2 Scaling Video Masked Autoencoders With Dual Masking

VL-SAT Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph

YOLOv7 Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors

Zero-Shot Pose Transfer for Unrigged Stylized 3D Characters

Active Exploration of Multimodal Complementarity for Few-Shot Action Recognition

Learning Neural Duplex Radiance Fields for Real-Time View Synthesis

Vita-CLIP Video and Text Adaptive CLIP via Multimodal Prompting

Virtual Occlusions Through Implicit Depth

Power Bundle Adjustment for Large-Scale 3D Reconstruction

Removing Objects From Neural Radiance Fields

Masked Autoencoding Does Not Help Natural Language Supervision at Scal

Adaptive Graph Convolutional Subspace Clustering

Autoregressive Visual Tracking

CFA Class-Wise Calibrated Fair Adversarial Training

Enhancing the Self-Universality for Transferable Targeted Attacks

Fine-Grained Classification With Noisy Labels

Focused and Collaborative Feedback Integration for Interactive Image Segmentation

iCLIP Bridging Image Classification and Contrastive Language-Image Pre-Training for Visual

Inferring and Leveraging Parts From Object Shape for Improving Semantic

Joint Token Pruning and Squeezing Towards More Aggressive Compression o

LEGO-Net Learning Regular Rearrangements of Objects in Rooms

MMANet Margin-Aware Distillation and Modality-Aware Regularization for Incomplete Multimodal Learning

Physically Adversarial Infrared Patches With Learnable Shapes and Locations

Sparsifiner Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers

Super-Resolution Neural Operato

TAPS3D Text-Guided 3D Textured Shape Generation From Pseudo Supervision

Text-Guided Unsupervised Latent Transformation for Multi-Attribute Image Manipulation

Towards Realistic Long-Tailed Semi-Supervised Learning Consistency Is All You N

3D Human Keypoints Estimation From Point Clouds in the Wil

Event-Based Blurry Frame Interpolation Under Blind Exposu

PersonNeRF Personalized Reconstruction From Photo Collections

BundleSDF Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects

CAP-VSTNet Content Affinity Preserved Versatile Style Trans

Crowd3D Towards Hundreds of People Reconstruction From a Single Imag

DIP Dual Incongruity Perceiving Network for Sarcasm Detection

Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action

Highly Confident Local Structure Based Consensus Graph Learning for Incomplet

Learnable Skeleton-Aware 3D Point Cloud Sampling

Recognizing Rigid Patterns of Unlabeled Point Clouds by Complete an

Black-Box Sparse Adversarial Attack via Multi-Objective Optimisation

Behind the Scenes Density Fields for Single View Reconstruction

Initialization Noise in Image Gradients and Saliency Maps

Heat Diffusion Based Multi-Scale and Geometric Structure-Aware Transformer for Mesh

ConvNeXt V2 Co-Designing and Scaling ConvNets With Masked Autoencoders

Differentiable Shadow Mapping for Efficient Inverse Graphics

Aligning Bag of Regions for Open-Vocabulary Object Detection

Asymmetric Feature Fusion for Image Retrieval

Attention-Based Point Cloud Edge Sampling

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition With Pre-Trained Vision-Languag

Boosting Detection in Crowd Analysis via Underutilized Output Features

Cap4Video What Can Auxiliary Captions Do for Text-Video Retrieval

CHMATCH Contrastive Hierarchical Matching and Robust Adaptive Threshold Boosted Semi-Supervis

Co-Salient Object Detection With Uncertainty-Aware Group Exchange-Masking

CORA Adapting CLIP for Open-Vocabulary Detection With Region Prompting an

Deep Stereo Video Inpainting

Discriminating Known From Unknown Objects via Structure-Enhanced Recurrent Variational AutoEnco

DropMAE Masked Autoencoders With Spatial-Attention Dropout for Tracking Tasks

EDA Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding

Fast Point Cloud Generation With Straight Flows

GANHead Towards Generative Animatable Neural Head Avatars

High-Fidelity 3D Face Generation From Natural Language Descriptions

Incremental 3D Semantic Scene Graph Prediction From RGB Sequences

Learning Semantic-Aware Knowledge Guidance for Low-Light Image Enhancement

Logical Consistency and Greater Descriptive Power for Facial Hair Attribut

MagicPony Learning Articulated 3D Animals in the Wil

Masked Scene Contrast A Scalable Framework for Unsupervised 3D Representation

Multiview Compressive Coding for 3D Reconstruction

NeFII Inverse Rendering for Reflectance Decomposition With Near-Field Indirect Illumination

Neural Fourier Filter Bank

NewsNet A Novel Dataset for Hierarchical Temporal Segmentation

OmniObject3D Large-Vocabulary 3D Object Dataset for Realistic Perception Reconstruction an

Pix2map Cross-Modal Retrieval for Inferring Street Maps From Images

PointConvFormer Revenge of the Point-Based Convolution

Referring Multi-Object Tracking

RIDCP Revitalizing Real Image Dehazing via High-Quality Codebook Priors

SCoDA Domain Adaptive Shape Completion for Real Scans

Semi-Supervised Stereo-Based 3D Object Detection via Cross-View Consensus

Semi-Supervised Video Inpainting With Cycle Consistency Constraints

Sparsely Annotated Semantic Segmentation With Adaptive Gaussian Mixtures

Spatiotemporal Self-Supervised Learning for Point Clouds in the Wil

STMixer A One-Stage Sparse Action Detecto

Switchable Representation Learning Framework With Self-Compatibility

Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models

Unsupervised Visible-Infrared Person Re-Identification via Progressive Graph Matching and Alternat

Virtual Sparse Convolution for Multimodal 3D Object Detection

DiffusioNeRF Regularizing Neural Radiance Fields With Denoising Diffusion Models

SQUID Deep Feature In-Painting for Unsupervised Anomaly Detection

Neural Lens Modeling

3D Semantic Segmentation in the Wild Learning Generalized Models fo

CutMIB Boosting Light Field Super-Resolution via Multi-View Image Blending

DLBD A Self-Supervised Direct-Learned Binary Descripto

Endpoints Weight Fusion for Class Incremental Semantic Segmentation

Level-S2fM Structure From Motion on Neural Level Set of Implicit

LSTFE-NetLong Short-Term Feature Enhancement Network for Video Small Object Detection

Masked Images Are Counterfactual Samples for Robust Fine-Tuning

SCPNet Semantic Scene Completion on Point Clou

Structured Sparsity Learning for Efficient Video Super-Resolution

Towards Effective Visual Representations for Partial-Label Learning

VecFontSDF Learning To Reconstruct and Synthesize High-Quality Vector Fonts vi

Active Finetuning Exploiting Annotation Budget in the Pretraining-Finetuning Paradigm

Adversarially Robust Neural Architecture Search for Graph Neural Networks

An Actor-Centric Causality Graph for Asynchronous Temporal Inference in Grou

Blemish-Aware and Progressive Face Retouching With Limited Paired Dat

Category Query Learning for Human-Object Interaction Classification

DINER Disorder-Invariant Implicit Neural Representation

Exploring and Exploiting Uncertainty for Incomplete Multi-View Classification

GP-VTON Towards General Purpose Virtual Try-On via Collaborative Local-Flow Global-Parsing

High-Fidelity 3D GAN Inversion by Pseudo-Multi-View Optimization

MAESTER Masked Autoencoder Guided Segmentation at Pixel Resolution for Accurat

OmniVidar Omnidirectional Depth Estimation From Multi-Fisheye Images

On Data Scaling in Masked Image Modeling

Poly-PC A Polyhedral Network for Multiple Point Cloud Tasks at

RA-CLIP Retrieval Augmented Contrastive Language-Image Pre-Training

Revealing the Dark Secrets of Masked Image Modeling

SmartBrush Text and Shape Guided Object Inpainting With Diffusion Model

Towards a Smaller Student Capacity Dynamic Distillation for Efficient Imag

Toward Stable Interpretable and Lightweight Hyperspectral Super-Resolution

Unpaired Image-to-Image Translation With Shortest Path Regularization

VideoTrack Learning To Track Objects via Video Transform

Visibility Aware Human-Object Interaction Tracking From Single RGB Cam

CodeTalker Speech-Driven 3D Facial Animation With Discrete Motion Prio

SVFormer Semi-Supervised Video Transformer for Action Recognition

CAPE Camera View Position Embedding for Multi-View 3D Object Detection

CASP-Net Rethinking Video Saliency Prediction From an Audio-Visual Consistency Perceptual

FedDM Iterative Distribution Matching for Communication-Efficient Federated Learning

Learning Compact Representations for LiDAR Completion and Generation

Neural Map Prior for Autonomous Driving

Similarity Metric Learning for RGB-Infrared Group Re-Identification

ECON Explicit Clothed Humans Optimized via Normal Integration

Egocentric Video Task Translation

Freestyle Layout-to-Image Synthesis

GarmentTracking Category-Level Garment Pose Tracking

IMP Iterative Matching and Pose Estimation With Adaptive Pooling

SFD2 Semantic-Guided Feature Detection and Description

Stare at What You See Masked Image Modeling Without Reconstruction

Abstract Visual Reasoning An Algebraic Approach for Solving Ravens Progressiv

A Unified Spatial-Angular Structured Light for Single-View Acquisition of Sh

Bias-Eliminating Augmentation Learning for Debiased Federated Learning

Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis

Constructing Deep Spiking Neural Networks From Artificial Neural Networks With

CXTrack Improving 3D Point Cloud Tracking With Contextual Information

DisCoScene Spatially Disentangled Generative Radiance Fields for Controllable 3D-Aware Scen

Dream3D Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Imag

Dynamic Coarse-To-Fine Learning for Oriented Tiny Object Detection

EqMotion Equivariant Multi-Agent Motion Prediction With Invariant Interaction Reasoning

Gaussian Label Distribution Learning for Spherical Image Object Detection

Generating Features With Increased Crop-Related Diversity for Few-Shot Object Detection

Grid-Guided Neural Radiance Fields for Large Urban Scenes

H2ONet Hand-Occlusion-and-Orientation-Aware Network for Real-Time 3D Hand Mesh Reconstruction

HandsOff Labeled Dataset Generation With No Additional Human Annotations

High-Fidelity Generalized Emotional Talking Face Generation With Multi-Modal Emotion Spac

Iterative Geometry Encoding Volume for Stereo Matching

JacobiNeRF NeRF Shaping With Mutual Information Gradients

Learning Dynamic Style Kernels for Artistic Style Trans

Learning Imbalanced Data With Vision Transformers

Learning Multi-Modal Class-Specific Tokens for Weakly Supervised Dense Object Localization

Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision

Learning To Generate Image Embeddings With User-Level Differential Privacy

Low-Light Image Enhancement via Structure Modeling and Guidanc

MEDIC Remove Model Backdoors via Importance Driven Cloning

Meta Compositional Referring Expression Segmentation

MM-3DScene 3D Scene Understanding by Customizing Masked Modeling With Informative-Preserv

Multi-View Adversarial Discriminator Mine the Non-Causal Factors for Object Detection

MV-JAR Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

NeuralLift-360 Lifting an In-the-Wild 2D Photo to a 3D Object

OmniAvatar Geometry-Guided Controllable 3D Head Synthesis

Open-Vocabulary Panoptic Segmentation With Text-to-Image Diffusion Models

PIDNet A Real-Time Semantic Segmentation Network Inspired by PID Controllers

Probabilistic Knowledge Distillation of Face Ensembles

Q-DETR An Efficient Low-Bit Quantized Detection Transform

Seeing Electric Network Frequency From Events

Side Adapter Network for Open-Vocabulary Semantic Segmentation

Toward RAW Object Detection A New Benchmark and a New

Uncovering the Missing Pattern Unified Framework Towards Trajectory Imputation an

UniDexGrasp Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation

Unsupervised 3D Shape Reconstruction by Part Retrieval and Assembly

Unsupervised Domain Adaption With Pixel-Level Discriminator for Image-Aware Layout Generation

V2V4Real A Real-World Large-Scale Dataset for Vehicle-to-Vehicle Cooperative Perception

Video Dehazing via a Multi-Range Temporal Alignment Network With Physical

Visual-Tactile Sensing for In-Hand Object Reconstruction

Where Is My Wallet Modeling Object Proposal Sets for Egocentric

Zero-Shot Dual-Lens Super-Resolution

Zero-Shot Object Counting

Habitat-Matterport 3D Semantics Dataset

Behavioral Analysis of Vision-and-Language Navigation Agents

BEVFormer v2 Adapting Modern Image Backbones to Birds-Eye-View Recognition vi

BEVHeight A Robust Framework for Vision-Based Roadside 3D Object Detection

BiCro Noisy Correspondence Rectification for Multi-Modality Data via Bi-Directional Cross-Modal

Bootstrap Your Own Prior Towards Distribution-Agnostic Novel Class Discovery

Complementary Intrinsics From Neural Radiance Fields and CNNs for Outdoo

Context De-Confounded Emotion Recognition

ContraNeRF Generalizable Neural Radiance Fields for Synthetic-to-Real Novel View Synthesis

DeCo Decomposition and Reconstruction for Compositional Temporal Grounding via Coarse-To-Fin

Diffusion Probabilistic Model Made Slim

Directional Connectivity-Based Segmentation of Medical Images

Efficient On-Device Training via Gradient Filtering

FreeNeRF Improving Few-Shot Neural Rendering With Free Frequency Regularization

GD-MAE Generative Decoder for MAE Pre-Training on LiDAR Point Clouds

Geometry and Uncertainty-Aware 3D Point Cloud Class-Incremental Semantic Segmentation

Global Vision Transformer Pruning With Hessian-Aware Saliency

Good Is Bad Causality Inspired Cloth-Debiasing for Cloth-Changing Person Re-Identification

HOTNAS Hierarchical Optimal Transport for Neural Architecture Search

IDGI A Framework To Eliminate Explanation Noise From Integrated Gradients

Improving Visual Grounding by Encouraging Consistent Gradient-Based Explanations

K3DN Disparity-Aware Kernel Estimation for Dual-Pixel Defocus Deblurring

Language in a Bottle Language Model Guided Concept Bottlenecks fo

Learning Event Guided High Dynamic Range Video Reconstruction

MIANet Aggregating Unbiased Instance and General Information for Few-Shot Semantic

Modeling Entities As Semantic Points for Visual Information Extraction in

NeRFVS Neural Radiance Fields for Free View Synthesis via Geometry

Neural Vector Fields Implicit Representation by Explicit Learning

Neural Volumetric Memory for Visual Locomotion Control

Object Pose Estimation With Statistical Guarantees Conformal Keypoint Detection an

Paint by Example Exemplar-Based Image Editing With Diffusion Models

Panoptic Video Scene Graph Generation

POEM Reconstructing Hand in a Point Embedded Multi-View Stereo

Progressive Open Space Expansion for Open-Set Model Attribution

Pruning Parameterization With Bi-Level Optimization for Efficient Semantic Segmentation on

PVT-SSD Single-Stage 3D Object Detector With Point-Voxel Transform

QPGesture Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gestu

Reconstructing Animatable Categories From Videos

ReCo Region-Controlled Text-to-Image Generation

Relational Space-Time Query in Long-Form Videos

Resource-Efficient RGBD Aerial Tracking

Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation

RILS Masked Visual Reconstruction in Language Semantic Spac

Semantic Human Parsing via Scalable Semantic Transfer Over Multiple Label

TINC Tree-Structured Implicit Neural Compression

TopDiG Class-Agnostic Topological Directional Graph Extraction From Remote Sensing Images

Towards Bridging the Performance Gaps of Joint Energy-Based Models

Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition

UniSim A Neural Closed-Loop Sensor Simulato

VectorFloorSeg Two-Stream Graph Attention Network for Vectorized Roughcast Floorplan Segmentation

Vector Quantization With Self-Attention for Quality-Independent Representation Learning

Vid2Seq Large-Scale Pretraining of a Visual Language Model for Dens

Video Event Restoration Based on Keyframes for Video Anomaly Detection

Visual Recognition-Driven Image Restoration for Multiple Degradation With Intrinsic Semantics

A Unified HDR Imaging Method With Pixel and Patch Level

CIMI4D A Large Multimodal Climbing Motion Dataset Under Human-Scene Interactions

GCFAgg Global and Cross-View Feature Aggregation for Multi-View Clustering

Linking Garment With Person via Semantically Associated Landmarks for Virtual

Long-Term Visual Localization With Mobile Sensors

NeRF-DS Neural Radiance Fields for Dynamic Specular Objects

PlenVDB Memory Efficient VDB-Based Radiance Fields for Fast Training an

SMAE Few-Shot Learning for HDR Deghosting With Saturation-Aware Masked Autoencoders

Towards Trustable Skin Cancer Diagnosis via Rewriting Models Decision

Two-Shot Video Object Segmentation

Universal Instance Perception As Object Discovery and Retrieval

DetCLIPv2 Scalable Open-Vocabulary Object Detection Pre-Training via Word-Region Alignment

Explicit Boundary Guided Semi-Push-Pull Contrastive Learning for Supervised Anomaly Detection

HGNet Learning Hierarchical Geometry From Points Edges and Surfaces

Hi-LASSIE High-Fidelity Articulated Shape and Skeleton Discovery From Sparse Imag

Large-Scale Training Data Search for Object Re-Identification

Local Implicit Normalizing Flow for Arbitrary-Scale Image Super-Resolution

Teacher-Generated Spatial-Attention Labels Boost Robustness and Accuracy of Contrastive Models

Visual-Language Prompt Tuning With Knowledge-Guided Context Optimization

Meta-Personalizing Vision-Language Models To Find Named Instances in Video

AccelIR Task-Aware Image Compression for Accelerating Neural Restoration

Affordance Diffusion Synthesizing Hand-Object Interactions

Decoupling Human and Camera Motion From Videos in the Wil

DeepSolo Let Transformer Decoder With Explicit Points Solo for Text

DistilPose Tokenized Pose Regression With Heatmap Distillation

Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles

NEF Neural Edge Fields for 3D Parametric Curve Reconstruction From

Partial Network Cloning

PVO Panoptic Visual Odometry

Self-Supervised Super-Plane for Neural 3D Reconstruction

Mapping Degeneration Meets Label Evolution Learning Infrared Small Target Detection

1 VS 100 Parameter-Efficient Low Rank Adapter for Dense Predictions

3D GAN Inversion With Facial Symmetry Prio

AGAIN Adversarial Training With Attribution Span Enlargement and Hybrid Featu

GIVL Improving Geographical Inclusivity of Vision-Language Models With Pre-Training Methods

Gloss Attention for Gloss-Free Sign Language Translation

Hi4D 4D Instance Segmentation of Close Human Interaction

Multi-Space Neural Radiance Fields

NeRFInvertor High Fidelity NeRF-GAN Inversion for Single-Shot Real Image Animation

A Simple Framework for Text-Supervised Semantic Segmentation

Generating Holistic 3D Human Motion From Speech

MIME Human-Aware 3D Scene Generation

NAR-Former Neural Architecture Representation Learning Towards Holistic Attributes Prediction

Towards Artistic Image Aesthetics Assessment A Large-Scale Dataset an

Weakly-Supervised Single-View Image Relighting

A General Regret Bound of Preconditioned Gradient Method for DNN

Cross-Guided Optimization of Radiance Fields With Multi-View Image Super-Resolution fo

Towards End-to-End Generative Modeling of Long Videos With Memory-Efficient Bidirectional

Light Source Separation and Intrinsic Image Decomposition Under AC Illumination

Rawgment Noise-Accounted RAW Augmentation Enables Recognition in a Wide Variety

Deformable Mesh Transformer for 3D Human Mesh Recovery

Castling-ViT Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision

UTM A Unified Multiple Object Tracking Model With Identity-Aware Featu

Bi3D Bi-Domain Active Learning for Cross-Domain 3D Object Detection

Devil Is in the Queries Advancing Mask Transformers for Real-Worl

Robust Test-Time Adaptation in Dynamic Scenarios

You Are Catching My Attention Are Vision Transformers Bad Learners

Connecting the Dots Floorplan Reconstruction Using Two-Level Queries

IFSeg Image-Free Semantic Segmentation via Vision-Language Model

Accidental Light Probes

ACR Attention Collaboration-Based Regressor for Arbitrary Two-Hand Reconstruction

Adaptive Spot-Guided Transformer for Consistent Local Feature Matching

ANetQA A Large-Scale Benchmark for Fine-Grained Compositional Reasoning Over Untrimm

Block Selection Method for Using Feature Norm in Out-of-Distribution Detection

Boost Vision Transformer With GPU-Friendly Sparsity and Quantization

CelebV-Text A Large-Scale Facial Text-Video Dataset

Data-Free Knowledge Distillation via Feature Exchange and Activation Region Constraint

Distribution Shift Inversion for Out-of-Distribution Prediction

DyLiN Making Light Field Networks Dynamic

Efficient Loss Function by Minimizing the Detrimental Effect of Floating-Point

Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation

Fusing Pre-Trained Language Models With Multimodal Prompts Through Reinforcement Learning

Graphics Capsule Learning Hierarchical 3D Face Representations From 2D Images

Hint-Aug Drawing Hints From Foundation Vision Transformers Towards Boosted Few-Shot

How To Prevent the Continuous Damage of Noises To Model

Learning Procedure-Aware Video Representation From Instructional Videos and Their Narrations

MAGVIT Masked Generative Video Transform

Mind the Label Shift of Augmentation-Based Graph OOD Generalization

MonoHuman Animatable Human Neural Field From Monocular Video

MVImgNet A Large-Scale Dataset of Multi-View Images

On the Difficulty of Unpaired Infrared-to-Visible Video Translation Fine-Grained Content-Rich

OSRT Omnidirectional Image Super-Resolution With Distortion-Aware Transform

Overcoming the Trade-Off Between Accuracy and Plausibility in 3D Han

PanelNet Understanding 360 Indoor Environment via Panel Representation

PEAL Prior-Embedded Explicit Attention Learning for Low-Overlap Point Cloud Registration

Phase-Shifting Coder Predicting Accurate Orientation in Oriented Object Detection

Range-Nullspace Video Frame Interpolation With Focalized Motion Estimation

Rotation-Invariant Transformer for Point Cloud Matching

Semi-Supervised Domain Adaptation With Source Label Adaptation

Task Residual for Tuning Vision-Language Models

TOPLight Lightweight Neural Networks With Task-Oriented Pretraining for Visible-Infrared Recognition

Turning a CLIP Model Into a Scene Text Detecto

V2X-Seq A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception an

Video Probabilistic Diffusion Models in Projected Latent Spac

X-Pruner eXplainable Pruning for Vision Transformers

Zero-Shot Referring Image Segmentation With Global-Local Context Features

Hierarchical Video-Moment Retrieval and Step-Captioning

TrainTest-Time Adaptation With Retrieval

Discovering the Real Association Multimodal Causal Reasoning in Video Question

AutoLabel CLIP-Based Framework for Open-Set Video Domain Adaptation

OCTET Object-Aware Counterfactual Explanations

3D-Aware Facial Landmark Detection via Multi-View Consistent Training on Synthetic

CLIP2 Contrastive Language-Image-Point Pretraining From Real-World Point Cloud Dat

ConZIC Controllable Zero-Shot Image Captioning by Sampling-Based Polishing

Deep Fair Clustering via Maximizing and Minimizing Mutual Information Theory

Distilling Focal Knowledge From Imperfect Expert for 3D Object Detection

Learning Transferable Spatiotemporal Representations From Natural Script Knowledg

PEFAT Boosting Semi-Supervised Medical Image Classification via Pseudo-Loss Estimation an

Real-Time Multi-Person Eyeblink Detection in the Wild for Untrimmed Video

SceneComposer Any-Level Semantic Image Synthesis

Feature Representation Learning With Adaptive Displacement Generation and Transformer Fusion

3D-Aware Object Goal Navigation via Simultaneous Exploration and Identification

3D Registration With Maximal Cliques

Accelerating Dataset Distillation via Model Augmentation

Aligning Step-by-Step Instructional Diagrams to Video Demonstrations

Analyzing Physical Impacts Using Transient Surface Wave Imaging

A Loopback Network for Explainable Microvascular Invasion Classification

Backdoor Defense via Deconfounded Representation Learning

Blind Image Quality Assessment via Vision-Language Correspondence A Multitask Learning

Boosting Verified Training for Robust Image Classifications via Abstraction

Boosting Video Object Segmentation via Space-Time Correspondence Learning

CLAMP Prompt-Based Contrastive Learning for Connecting Language and Animal Pos

Class Relationship Embedded Learning for Source-Free Unsupervised Domain Adaptation

CloSET Modeling Clothed Humans on Continuous Surface With Explicit Templat

Coaching a Teachable Student

Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning

CompletionFormer Depth Completion With Convolutions and Vision Transformers

DA-DETR Domain Adaptive Detection Transformer With Information Fusion

Decoupling MaxLogit for Out-of-Distribution Detection

Delivering Arbitrary-Modal Semantic Segmentation

Dense Distinct Query for End-to-End Object Detection

DeSTSeg Segmentation Guided Denoising Student-Teacher for Anomaly Detection

DiffCollage Parallel Generation of Large Content With Diffusion Models

Differentiable Architecture Search With Random Features

Dimensionality-Varying Diffusion Process

Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-In

Document Image Shadow Removal Guided by Color-Aware Backgroun

Efficient Map Sparsification Based on 2D and 3D Discretized Grids

Efficient RGB-T Tracking via Cross-Modality Distillation

Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervis

Exploring Intra-Class Variation Factors With Learnable Cluster Prompts for Semi-Supervis

Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video

Federated Domain Generalization With Generalization Adjustment

Frame-Event Alignment and Fusion Network for High Frame Rate Tracking

Frame Flexible Network

Frequency-Modulated Point Cloud Rendering With Easy Editing

Generalization Matters Loss Minima Flattening via Parameter Hybridization for Efficient

Generating Human Motion From Textual Descriptions With Discrete Representations

GeoMVSNet Learning Multi-View Stereo With Geometry Perception

Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization

GrowSP Unsupervised Semantic Segmentation of 3D Point Clouds

Hyperspherical Embedding for Point Cloud Completion

Implicit Surface Contrastive Clustering for LiDAR Point Clouds

Improving Graph Representation for Point Cloud Segmentation via Attentive Filtering

Improving the Transferability of Adversarial Samples by Path-Augmented Metho

Ingredient-Oriented Multi-Degradation Learning for Image Restoration

Inversion-Based Style Transfer With Diffusion Models

Layout-Based Causal Inference for Object Navigation

Learning 3D Representations From 2D Pre-Trained Models via Image-to-Point Mask

Learning Debiased Representations via Conditional Attribute Interpolation

Learning Emotion Representations From Verbal and Nonverbal Communication

Learning Neural Proto-Face Field for Disentangled 3D Face Modeling in

Learning To Generate Language-Supervised and Open-Vocabulary Scene Graph Using Pre-Train

Lite-Mono A Lightweight CNN and Transformer Architecture for Self-Supervised Monocul

LOGO A Long-Form Video Dataset for Group Action Quality Assessment

Lookahead Diffusion Probabilistic Models for Refining Mean Estimation

LVQAC Lattice Vector Quantization Coupled With Spatially Adaptive Companding fo

MD-VQA Multi-Dimensional Quality Assessment for UGC Live Videos

MetaPortrait Identity-Preserving Talking Head Generation With Fast Personalized Adaptation

Modeling Video As Stochastic Processes for Fine-Grained Video Representation Learning

MOTRv2 Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors

MP-Former Mask-Piloted Transformer for Image Segmentation

Multi-View Stereo Representation Revist Region-Aware MVSNet

Nerflets Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation

NeuralDome A Neural Modeling Pipeline on Multi-View Human-Object Interactions

NICO Towards Better Benchmarking for Domain Generalization

Object Detection With Self-Supervised Scene Adaptation

Painting 3D Nature in 2D View Synthesis of Natural Scenes

PeakConv Learning Peak Receptive Field for Radar Semantic Segmentation

PHA Patch-Wise High-Frequency Augmentation for Transformer-Based Person Re-Identification

PointCert Point Cloud Classification With Deterministic Certified Robustness Guarantees

PointDistiller Structured Knowledge Distillation Towards Efficient and Compact 3D Detection

PRISE Demystifying Deep Lucas-Kanade With Strongly Star-Convex Constraints for Multimodel

PromptCAL Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel

Prompt Generate Then Cache Cascade of Foundation Models Makes Strong

Prototypical Residual Networks for Anomaly Detection and Localization

Quantum-Inspired Spectral-Spatial Pyramid Network for Hyperspectral Image Classification

Real-Time Controllable Denoising for Image and Video

Ref-NPR Reference-Based Non-Photorealistic Radiance Fields for Controllable Scene Stylization

Regularized Vector Quantization for Tokenized Image Synthesis

Revisiting Rotation Averaging Uncertainties and Robust Losses

Revisiting the Stack-Based Inverse Tone Mapping

SadTalker Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Singl

Seeing a Rose in Five Thousand Ways

Semi-DETR Semi-Supervised Object Detection With Detection Transformers

SINE SINgle Image Editing With Text-to-Image Diffusion Models

Skinned Motion Retargeting With Residual Perception of Motion Semantics

Starting From Non-Parametric Networks for 3D Point Cloud Analysis

Structural Multiplane Image Bridging Neural View Synthesis and 3D Reconstruction

Text-Visual Prompting for Efficient 2D Temporal Video Grounding

TokenHPE Learning Orientation Tokens for Efficient Head Pose Estimation vi

Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors

Towards Unbiased Volume Rendering of Neural Implicit Surfaces With Geometry

Towards Unsupervised Object Detection From LiDAR Point Clouds

Transferable Adversarial Attacks on Vision Transformers With Token Gradient Regularization

Transforming Radiance Field With Lipschitz Network for Photorealistic 3D Scen

Two-Stage Co-Segmentation Network Based on Discriminative Representation for Recovering Human

Uni3D A Unified Baseline for Multi-Dataset 3D Object Detection

UniDAformer Unified Domain Adaptive Panoptic Segmentation Transformer via Hierarchical Mask

Unlearnable Clusters Towards Label-Agnostic Unlearnable Examples

VQACL A Novel Visual Question Answering Continual Learning Setting

Weakly Supervised Segmentation With Point Annotations for Histopathology Images vi

Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal

WeatherStream Light Transport Automation of Single Image Deweathering

Wide-Angle Rectification via Content-Aware Conformal Mapping

ARKitTrack A New Diverse Dataset for Tracking Using Mobile RGB-D

Augmentation Matters A Simple-Yet-Effective Approach to Semi-Supervised Semantic Segmentation

CDDFuse Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion

Comprehensive and Delicate An Efficient Transformer for Image Restoration

DiffSwap High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion

DNeRV Modeling Inherent Dynamics via Difference Neural Representation for Videos

Exploring Incompatible Knowledge Transfer in Few-Shot Image Generation

Few-Shot Class-Incremental Learning via Class-Aware Bilateral Distillation

High-Frequency Stereo Matching Network

Improved Distribution Matching for Dataset Condensation

Instance-Specific and Model-Adaptive Supervision for Semi-Supervised Semantic Segmentation

Learning Anchor Transformations for 3D Garment Animation

Learning Video Representations From Large Language Models

MetaFusion Infrared and Visible Image Fusion via Meta-Feature Embedding From

Minimizing Maximum Model Discrepancy for Transferable Black-Box Targeted Attacks

OmniAL A Unified CNN Framework for Unsupervised Anomaly Localization

Open Set Action Recognition via Multi-Label Evidential Learning

PoseFormerV2 Exploring Frequency Domain for Efficient and Robust 3D Human

Quality-Aware Pre-Trained Models for Blind Image Quality Assessment

Re2TAL Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization

Representation Learning for Visual Object Tracking by Masked Appearance Trans

Rethinking Gradient Projection Continual Learning Stability Plasticity Feature Spac

Search-Map-Search A Frame Selection Paradigm for Action Recognition

Semi-Supervised Hand Appearance Recovery via Structure Disentanglement and Dual Adversarial

Streaming Video Model

The Resource Problem of Using Linear Layer Leakage Attack in

Towards Better Stability and Adaptability Improve Online Self-Training for Model

Zero-Shot Text-to-Parameter Translation for Game Character Auto-Creation

Both Style and Distortion Matter Dual-Path Unsupervised Domain Adaptation fo

CAMS CAnonicalized Manipulation Spaces for Category-Level Functional Hand-Object Manipulation Synthesis

Curricular Contrastive Regularization for Physics-Aware Single Image Dehazing

CVT-SLR Contrastive Visual-Textual Transformation for Sign Language Recognition With Variational

EditableNeRF Editing Topologically Varying Neural Radiance Fields by Key Points

EXIF As Language Learning Cross-Modal Associations Between Images and Cam

FeatER An Efficient Network for Human Reconstruction via Feature Map-Bas

HairStep Transfer Synthetic to Real Using Strand and Depth Maps

HS-Pose Hybrid Scope Feature Extraction for Category-Level Object Pose Estimation

LayoutDiffusion Controllable Diffusion Model for Layout-to-Image Generation

Learning Visibility Field for Detailed 3D Human Reconstruction and Relighting

NeuFace Realistic 3D Neural Face Rendering From Multi-View Images

NeuralPCI Spatio-Temporal Neural Field for 3D Point Cloud Multi-Frame Non-Lin

Open-Category Human-Object Interaction Pre-Training via Language Modeling Framework

PointAvatar Deformable Point-Based Head Avatars From Videos

POTTER Pooling Attention Transformer for Efficient Human Mesh Recovery

Prototype-Based Embedding Network for Scene Graph Generation

TrojViT Trojan Insertion in Vision Transformers

Where Is My Spot Few-Shot Image Generation via Latent Subspac

Decentralized Learning With Multi-Headed Distillation

Blur Interpolation Transformer for Real-World Motion From Blu

Identity-Preserving Talking Face Generation With Landmark and Appearance Priors

Understanding Imbalanced Semantic Segmentation Through Neural Collaps

Adaptive Sparse Pairwise Loss for Object Re-Identification

BEVDC Birds-Eye View Assisted Training for Depth Completion

Class-Conditional Sharpness-Aware Minimization for Deep Long-Tailed Recognition

Efficient Second-Order Plane Adjustment

Exploring Motion Ambiguity and Alignment for High-Quality Video Frame Interpolation

How Can Objects Help Action Recognition

Human Body Shape Completion With Implicit Shape and Flow Learning

HyperMatch Noise-Tolerant Semi-Supervised Learning via Relaxed Contrastive Constraint

Improving Weakly Supervised Temporal Action Localization by Bridging Train-Test G

Instance-Aware Domain Generalization for Face Anti-Spoofing

Interactive Segmentation As Gaussion Process Classification

Joint Visual Grounding and Tracking With Natural Language Specification

Learning Discriminative Representations for Skeleton Based Action Recognition

MonoATT Online Monocular 3D Object Detection With Adaptive Token Transform

Multi-Granularity Archaeological Dating of Chinese Bronze Dings Based on

NeRFLix High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-Viewpoint

NeRF in the Palm of Your Hand Corrective Augmentation fo

Neural Texture Synthesis With Guided Correspondenc

Non-Contrastive Learning Meets Language-Image Pre-Training

OcTr Octree-Based Transformer for 3D Object Detection

Procedure-Aware Pretraining for Instructional Video Understanding

Query-Centric Trajectory Prediction

Relightable Neural Human Assets From Multi-View Gradient Illuminations

RepMode Learning to Re-Parameterize Diverse Experts for Subcellular Structure Prediction

Revisiting Prototypical Network for Cross Domain Few-Shot Learning

Shifted Diffusion for Text-to-Image Generation

SparseFusion Distilling View-Conditioned Diffusion for 3D Reconstruction

STAR Loss Reducing Semantic Ambiguity in Facial Landmark Detection

Texture-Guided Saliency Distilling for Unsupervised Salient Object Detection

The Treasure Beneath Multiple Annotations An Uncertainty-Aware Edge Detecto

UDE A Unified Driving Engine for Human Motion Generation

UniDistill A Universal Cross-Modality Knowledge Distillation Framework for 3D Object

Unsupervised Cumulative Domain Adaptation for Foggy Scene Optical Flow

ZegCLIP Towards Adapting CLIP for Zero-Shot Semantic Segmentation

Deep Semi-Supervised Metric Learning With Mixed Label Propagation

GKEAL Gaussian Kernel Embedded Analytic Learning for Few-Shot Class Incremental

Towards Stable Human Pose Estimation via Cross-View Fusion and Foot

BiFormer Vision Transformer With Bi-Level Routing Attention

Conditional Text Image Generation With Diffusion Models

Confidence-Aware Personalized Federated Learning via Variational Expectation Maximization

ConQueR Query Contrast Voxel-DETR for 3D Object Detection

Continual Semantic Segmentation With Automatic Memory Sample Selection

Curricular Object Manipulation in LiDAR-Based Object Detection

E2PN Efficient SE3-Equivariant Point Network

EXCALIBUR Encouraging and Evaluating Embodied Exploration

I2-SDF Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in

IPCC-TP Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory

Knowledge Combination To Learn Rotated Detection Without Rotated Annotation

Learning Weather-General and Weather-Specific Features for Image Restoration Under Multipl

LightedDepth Video Depth Estimation in Light of Limited Inference View

NerVE Neural Volumetric Edges for Parametric Curve Extraction From Point

Occlusion-Free Scene Recovery via Neural Radiance Fields

OpenMix Exploring Outlier Samples for Misclassification Detection

Patch-Mix Transformer for Unsupervised Domain Adaptation A Game Perspectiv

PMatch Paired Masked Image Modeling for Dense Geometric Matching

Probability-Based Global Cross-Modal Upsampling for Pansharpening

R2Former Unified Retrieval and Reranking Transformer for Place Recognition

ScaleKD Distilling Scale-Aware Knowledge in Small Object Detecto

STMT A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition

Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation

TopNet Transformer-Based Object Placement Network for Image Compositing

Transductive Few-Shot Learning With Prototype-Based Label Propagation by Iterative Graph

TryOnDiffusion A Tale of Two UNets

Understanding the Robustness of 3D Object Detection With Birds-Eye-View Representations

VDN-NeRF Resolving Shape-Radiance Ambiguity via View-Dependence Normalization

Visual Prompt Multi-Modal Tracking

Instant Volumetric Head Avatars

Multi-View Reconstruction Using Signed Ray Distance Functions SRDF

AutoFocusFormer Image Segmentation off the Gri

PROB Probabilistic Objectness for Open World Object Detection

CLOTH4D A Dataset for Clothed Human Reconstruction

Generalized Decoding for Pixel Image and Languag

Natural Language-Assisted Sign Language Recognition

分类: CVPR导读 标签: 暂无标签

评论

暂无评论数据

暂无评论数据

目录