WACV2024论文集

冯大仙 2024-02-29 14:40 514

PDA-RWSR Pixel-Wise Degradation Adaptive Real-World Super-Resolution

ScanEnts3D Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D

VD-GR Boosting Visual Dialog With Cascaded Spatial-Temporal Multi-Modal Graphs

ParticleNeRF A Particle-Based Encoding for Online Neural Radiance Fields

Real Time GAZED Online Shot Selection and Editing of Virtual

SeaTurtleID2022 A Long-Span Dataset for Reliable Sea Turtle Re-Identification

ARNIQA Learning Distortion Manifold for Image Quality Assessment

Reference-Based Restoration of Digitized Analog Videotapes

Causal Analysis for Robust Interpretability of Neural Networks

Unsupervised Co-Generation of Foreground-Background Segmentation From Text-to-Image Synthesis

OptFlow Fast Optimization-Based Scene Flow Estimation Without Supervision

Cross-Feature Contrastive Loss for Decentralized Deep Learning on Heterogeneous Dat

A Coarse-To-Fine Pseudo-Labeling C2FPL Framework for Unsupervised Video Anomaly Detection

Optimizing Long-Term Robot Tracking With Multi-Platform Sensor Fusion

OVeNet Offset Vector Network for Semantic Segmentation

P-Age Pexels Dataset for Robust Spatio-Temporal Apparent Age Classification

Self-Supervised Learning With Masked Autoencoders for Teeth Segmentation From Intra-Oral

DDAM-PS Diligent Domain Adaptive Mixer for Person Search

Domain Generalization by Rejecting Extreme Augmentations

Late to the Party On-Demand Unlabeled Personalized Federated Learning

Self-Supervised Learning for Visual Relationship Detection Through Masked Bounding Box

Elusive Images Beyond Coarse Analysis for Fine-Grained Recognition

Amodal Intra-Class Instance Segmentation Synthetic Datasets and Benchmark

High-Fidelity Zero-Shot Texture Anomaly Localization Using Feature Correspondence Analysis

Blurry Video Compression A Trade-Off Between Visual Enhancement and Dat

Hybrid Sample Synthesis-Based Debiasing of Classifier in Limited Data Setting

TransFed A Way To Epitomize Focal Modulation Using Transformer-Based Federat

Continuous Adaptation for Interactive Segmentation Using Teacher-Student Architectu

Enhancing Multi-View Pedestrian Detection Through Generalized 3D Feature Pulling

Beyond Self-Attention Deformable Large Kernel Attention for Medical Image Segmentation

EmoStyle One-Shot Facial Expression Editing Using Continuous Emotion Parameters

Neural Echos Depthwise Convolutional Filters Replicate Biological Receptive Fields

Temporally-Consistent Video Semantic Segmentation With Bidirectional Occlusion-Guided Feature Propagation

Partial Binarization of Neural Networks for Budget-Aware Efficient Learning

AMEND Adaptive Margin and Expanded Neighborhood for Efficient Generalized Category

United We Stand Divided We Fall UnityGraph for Unsupervised Procedu

Weakly-Supervised Representation Learning for Video Alignment and Analysis

ProcSim Proxy-Based Confidence for Robust Similarity Learning

Fixed Pattern Noise Removal for Multi-View Single-Sensor Infrared Cam

FOSSIL Free Open-Vocabulary Semantic Segmentation Through Synthetic References Retrieval

MoRF Mobile Realistic Fullbody Avatars From a Monocular Video

EfficientAD Accurate Visual Anomaly Detection at Millisecond-Level Latencies

Beyond Active Learning Leveraging the Full Potential of Human Interaction

Multi-Source Domain Adaptation for Object Detection With Prototype-Based Mean Teach

Learning Class and Domain Augmentations for Single-Source Open-Domain Generalization

Adversarial Likelihood Estimation With One-Way Flows

IKEA Ego 3D Dataset Understanding Furniture Assembly Actions From Ego-View

Volumetric Disentanglement for 3D Scene Manipulation

PETIT-GAN Physically Enhanced Thermal Image-Translating Generative Adversarial Network

NOMAD A Natural Occluded Multi-Scale Aerial Dataset for Emergency Respons

Whats Outside the Intersection Fine-Grained Error Analysis for Semantic Segmentation

Guided Distillation for Semi-Supervised Instance Segmentation

EvDNeRF Reconstructing Event Data With Dynamic Neural Radiance Fields

TriPlaneNet An Encoder for EG3D Inversion

HALSIE Hybrid Approach to Learning Segmentation by Simultaneously Exploiting Imag

Multi-View Classification Using Hybrid Fusion and Mutual Distillation

ArtQuest Countering Hidden Language Biases in ArtVQ

Feed-Forward Latent Domain Adaptation

From Chaos to Calibration A Geometric Mutual Information Approach To

STYLIP Multi-Scale Style-Conditioned Prompt Learning for CLIP-Based Domain Generalization

FOUND Foot Optimization With Uncertain Normals for Surface Deformation Using

SupeRVol Super-Resolution Shape and Reflectance Estimation in Inverse Volume Rendering

Investigating the Role of Attribute Context in Vision-Language Models fo

MEGANet Multi-Scale Edge-Guided Attention Network for Weak Boundary Polyp Segmentation

UOW-Vessel A Benchmark Dataset of High-Resolution Optical Satellite Images fo

CHAI Craters in Historical Aerial Images

Spiking Denoising Diffusion Probabilistic Models

What Decreases Editing Capability Domain-Specific Hybrid Refinement for Improved GAN

ClusterFix A Cluster-Based Debiasing Approach Without Protected-Group Supervision

Pixel-Grounded Prototypical Part Networks

Location-Aware Self-Supervised Transformers for Semantic Segmentation

WildlifeDatasets An Open-Source Toolkit for Animal Re-Identification

Unsupervised and Semi-Supervised Co-Salient Object Detection via Segmentation Frequency Statistics

Learning-Based Spotlight Position Optimization for Non-Line-of-Sight Human Localization and Postu

PhISH-Net Physics Inspired System for High Resolution Underwater Image Enhancement

BEVMap Map-Aware BEV Modeling for 3D Perception

Fast Sun-Aligned Outdoor Scene Relighting Based on TensoRF

FLORA Fine-Grained Low-Rank Architecture Search for Vision Transform

LibreFace An Open-Source Toolkit for Deep Facial Expression Analysis

Shape From Shading for Robotic Manipulation

Continual Learning of Unsupervised Monocular Depth From Videos

3D Reconstruction of Interacting Multi-Person in Clothing From a Singl

NCIS Neural Contextual Iterative Smoothing for Purifying Adversarial Perturbations

Stereo Matching in Time 100 FPS Video Stereo Matching fo

A Sequential Learning-Based Approach for Monocular Human Performance Captu

Depth From Asymmetric Frame-Event Stereo A Divide-and-Conquer Approach

Letting 3D Guide the Way 3D Guided 2D Few-Shot Imag

Longformer Longitudinal Transformer for Alzheimers Disease Classification With Structural MRIs

Panelformer Sewing Pattern Reconstruction From 2D Garment Images

Pixel Matching Network for Cross-Domain Few-Shot Segmentation

Residual Graph Convolutional Network for Birds-Eye-View Semantic Segmentation

SCUNet Swin-UNet and CNN Bottleneck Hybrid Architecture With Multi-Fusion Dens

Show Your Face Restoring Complete Facial Images From Partial Observations

Training-Free Layout Control With Cross-Attention Guidanc

FIRE Food Image to REcipe Generation

Classifying Cable Tendency With Semantic Segmentation by Utilizing Real an

Masking Improves Contrastive Self-Supervised Learning for ConvNets and Saliency Tells

Re-Evaluating LiDAR Scene Flow

Dual Domain Diffusion Guidance for 3D CBCT Metal Artifact Reduction

P2D Plug and Play Discriminator for Accelerating GAN Frameworks

Bipartite Graph Diffusion Model for Human Interaction Generation

Interactive Network Perturbation Between Teacher and Students for Semi-Supervised Semantic

RMFER Semi-Supervised Contrastive Learning for Facial Expression Recognition With Reaction

Slice and Conquer A Planar-to-3D Framework for Efficient Interactive Segmentation

Assessing Neural Network Robustness via Adversarial Pivotal Tuning

Single Domain Generalization via Normalised Cross-Correlation Based Convolutions

PreciseDebias An Automatic Prompt Engineering Approach for Generative AI To

Membership Inference Attack Using Self Influence Functions

Simple Post-Training Robustness Using Test Time Augmentations and Random Forest

BSRAW Improving Blind RAW Image Super-Resolution

ZRG A Dataset for Multimodal 3D Residential Rooftop Understanding

LatentPaint Image Inpainting in Latent Space With Diffusion Models

PsyMo A Dataset for Estimating Self-Reported Psychological Traits From Gait

PECoP Parameter Efficient Continual Pretraining for Action Quality Assessment

TransRadar Adaptive-Directional Transformer for Real-Time Multi-View Radar Semantic Segmentation

Automated Camera Calibration via Homography Estimation With GNNs

IR-FRestormer Iterative Refinement With Fourier-Based Restormer for Accelerated MRI Reconstruction

Harnessing the Power of Multi-Lingual Datasets for Pre-Training Towards Enhancing

Limited Data Unlimited Potential A Study on ViTs Augmented by

FishTrack23 An Ensemble Underwater Dataset for Multi-Object Tracking

RGBT-Dog A Parametric Model and Pose Prior for Canine Body

RGB-X Object Detection via Scene-Specific Fusion Modules

Learning the What and How of Annotation in Video Object

How Do Deepfakes Move Motion Magnification for Deepfake Source Detection

Expanding Hyperspherical Space for Few-Shot Class-Incremental Learning

Ray Deformation Networks for Novel View Synthesis of Refractive Objects

Textual Alchemy CoFormer for Scene Text Understanding

Context in Human Action Through Motion Complementarity

CycleCL Self-Supervised Learning for Periodic Videos

AnyStar Domain Randomized Universal Star-Convex 3D Instance Segmentation

Nardin A One-Shot Learning Approach To Document Layout Segmentation of Ancient

Plaen Contrastive Learning for Multi-Object Tracking With Transformers

Improving Fairness Using Vision-Language Driven Image Augmentation

Estimating Fog Parameters From an Image Sequence Using Non-Linear Optimisation

PATROL Privacy-Oriented Pruning for Collaborative Inference Against Model Inversion Attacks

CARE Counterfactual-Based Algorithmic Recourse for Explainable Pose Correction

ProS Facial Omni-Representation Learning via Prototype-Based Self-Distillation

Do VSR Models Generalize Beyond LRS3

Learning Saliency From Fixations

Physical-Space Multi-Body Mesh Detection Achieved by Local Alignment and Global

Understanding Dark Scenes by Contrasting Multi-Modal Observations

A Multimodal Benchmark and Improved Architecture for Zero Shot Learning

Semantic Generative Augmentations for Few-Shot Counting

RobustCLEVR A Benchmark and Framework for Evaluating Robustness in Object-Centric

Evidential Uncertainty Quantification A Variance-Based Perspectiv

Mining and Unifying Heterogeneous Contrastive Relations for Weakly-Supervised Actor-Action Segmentation

Towards More Realistic Membership Inference Attacks on Large Diffusion Models

Tracking Skiers From the Top to the Bottom

HMP Hand Motion Priors for Pose and Shape Estimation From

POISE Pose Guided Human Silhouette Extraction Under Occlusions

Real-Time 6-DoF Pose Estimation by an Event-Based Camera Using Activ

Driving Through the Concept Gridlock Unraveling Explainability Bottlenecks in Automat

A Generic and Flexible Regularization Framework for NeRFs

Leveraging Bitstream Metadata for Fast Accurate Generalized Compressed Video Quality

Nested Diffusion Processes for Anytime Image Generation

DR10K Transfer Learning Using Weak Labels for Grading Diabetic Retinopathy

Mixing Gradients in Neural Networks as a Strategy To Enhanc

Exploiting the Signal-Leak Bias in Diffusion Models

Data Augmentation for Object Detection via Controllable Diffusion Models

Dynamic Multimodal Information Bottleneck for Multimodality Classification

Face Presentation Attack Detection by Excavating Causal Clues and Adapting

CryoRL Reinforcement Learning Enables Efficient Cryo-EM Data Collection

DeVos Flow-Guided Deformable Transformer for Video Object Segmentation

Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Languag

Seeing Stars Learned Star Localization for Narrow-Field Astrometry

3D Face Style Transfer With a Hybrid Solution of NeRF

RankDVQA Deep VQA Based on Ranking-Inspired Hybrid Training

MagneticPillars Efficient Point Cloud Registration Through Hierarchized Birds-Eye-View Cell Correspondenc

Deep Optics for Optomechanical Control Policy Design

dacl10k Benchmark for Semantic Bridge Damage Segmentation

Unsupervised Event-Based Video Reconstruction

InfraParis A Multi-Modal and Multi-Task Autonomous Driving Dataset

Automated Sperm Assessment Framework and Neural Network Specialized for Sperm

DTrOCR Decoder-Only Transformer for Optical Character Recognition

Few-Shot Generative Model for Skeleton-Based Human Action Synthesis Using Cross-Domain

Unsupervised Model-Based Learning for Simultaneous Video Deflickering and Deblotching

AssemblyNet A Point Cloud Dataset and Benchmark for Predicting Part

Generalizing to Unseen Domains in Diabetic Retinopathy Classification

Unified Concept Editing in Diffusion Models

An Empirical Investigation Into Benchmarking Model Multiplicity for Trustworthy Machin

CLIPAG Towards Generator-Free Text-to-Image Generation

Self-Supervised Representation Learning With Cross-Context Learning Between Global and Hypercolumn

STEP - Towards Structured Scene-Text Spotting

SphereCraft A Dataset for Spherical Keypoint Detection Matching and Cam

Towards a Dynamic Vision Sensor-Based Insect Camera T

FacadeNet Conditional Facade Synthesis via Selective Editing

Co-Speech Gesture Detection Through Multi-Phase Sequence Labeling

SigmML Metric Meta-Learning for Writer Independent Offline Signature Verification in

Do We Still Need Non-Maximum Suppression Accurate Confidence Estimates an

Beyond RGB A Real World Dataset for Multispectral Imaging in

So You Think You Can Track

Plasticity-Optimized Complementary Networks for Unsupervised Continual Learning

Domain Aligned CLIP for Few-Shot Classification

Separable Self and Mixed Attention Transformers for Efficient Object Tracking

SynergyNet Bridging the Gap Between Discrete and Continuous Representations fo

ISAR A Benchmark for Single- and Few-Shot Object Instance Segmentation

Active Batch Sampling for Multi-Label Classification With Binary User Feedback

Whats in the Flow Exploiting Temporal Motion Cues for Unsupervis

Learning Intra-Class Multimodal Distributions With Orthonormal Matrices

PressureVision Estimating Fingertip Pressure From Diverse RGB Images

WATCH Wide-Area Terrestrial Change Hypercub

The Paleographers Eye ex machina Using Computer Vision To Assist

TIAM - A Metric for Evaluating Alignment in Text-to-Image Generation

JOADAA Joint Online Action Detection and Action Anticipation

Boosting Weakly Supervised Object Detection Using Fusion and Priors From

Reducing the Side-Effects of Oscillations in Training of Quantized YOLO

Robust Object Detection in Challenging Weather Conditions

Torque Based Structured Pruning for Deep Neural Network

You Can Run but Not Hide Improving Gait Recognition With

Deep Metric Learning With Chance Constraints

Single Frame Semantic Segmentation Using Multi-Modal Spherical Images

Complex Organ Mask Guided Radiology Report Generation

Solving the Plane-Sphere Ambiguity in Top-Down Structure-From-Motion

Tracking Tiny Insects in Cluttered Natural Environments Using Refinable Recurrent

Watch Where You Head A View-Biased Domain Gap in Gait

BoostRad Enhancing Object Detection by Boosting Radar Reflections

Efficient MAE Towards Large-Scale Vision Transformers

Hybrid Neural Diffeomorphic Flow for Shape Representation and Generation vi

ProxEdit Improving Tuning-Free Real Image Editing With Proximal Guidanc

Diffusion-Based Generation of Histopathological Whole Slide Images at a Gigapixel

LInKs Lifting Independent Keypoints - Partial Pose Lifting for Occlusion

Learning To Generate Training Datasets for Robust Semantic Segmentation

FinderNet A Data Augmentation Free Canonicalization Aided Loop Detection an

Improving Graph Networks Through Selection-Based Convolution

Text-Guided Face Recognition Using Multi-Granularity Cross-Modal Contrastive Learning

Bridging Generalization Gaps in High Content Imaging Through Online Self-Supervis

ArcAid Analysis of Archaeological Artifacts Using Drawings

Attentive Prototypes for Source-Free Unsupervised Domain Adaptive 3D Object Detection

Active Learning With Task Consistency and Diversity in Multi-Task Networks

Monocular 3D Object Detection With LiDAR Guided Semi Supervised Activ

NITEC Versatile Hand-Annotated Eye Contact Dataset for Ego-Vision Interaction

Registered and Segmented Deformable Object Reconstruction From a Single View

PromptonomyViT Multi-Task Prompt Learning Improves Video Transformers Using Synthetic Scen

Prototype Learning for Explainable Brain Age Prediction

LidarCLIP or How I Learned To Talk to Point Clouds

Learning Transferable Representations for Image Anomaly Localization Using Dense Pretraining

Sound3DVDet 3D Sound Source Detection Using Multiview Microphone Array an

MS-EVS Multispectral Event-Based Vision for Deep Learning Based Face Detection

CLID Controlled-Length Image Descriptions With Limited Dat

Random Walks for Temporal Action Segmentation With Timestamp Supervision

Rotation-Constrained Cross-View Feature Fusion for Multi-View Appearance-Based Gaze Estimation

Learn To Unlearn for Deep Neural Networks Minimizing Unlearning Interferenc

CLRerNet Improving Confidence of Lane Detection With LaneIoU

Concept-Centric Transformers Enhancing Model Interpretability Through Object-Centric Concept Learning Within

Robust Eye Blink Detection Using Dual Embedding Video Vision Transform

D4 Detection of Adversarial Diffusion Deepfakes Using Disjoint Ensembles

Framework-Agnostic Semantically-Aware Global Reasoning for Segmentation

Multi-Modal Gaze Following in Conversational Scenarios

Natural Light Can Also Be Dangerous Traffic Sign Misinterpretation Un

Removing the Quality Tax in Controllable Face Generation

Scale-Adaptive Feature Aggregation for Efficient Space-Time Video Super-Resolution

Semantic Fusion Augmentation and Semantic Boundary Detection A Novel Approach

SOAP Cross-Sensor Domain Adaptation for 3D Object Detection Using Stationary

Bias and Diversity in Synthetic-Based Face Recognition

Efficient Explainable Face Verification Based on Similarity Score Argument Backpropagation

Are Natural Domain Foundation Models Useful for Medical Image Classification

Synthesizing Anyone Anywhere in Any Pos

CSAM A 2.5D Cross-Slice Attention Module for Anisotropic Volumetric Medical

Expanding Expressiveness of Diffusion Models With Limited Data via Self-Distillation

Embodied Human Activity Recognition

Learning To Adapt CLIP for Few-Shot Monocular Depth Estimation

ReCLIP Refine Contrastive Language Image Pre-Training With Source Free Domain

Temporal Context Enhanced Referring Video Object Segmentation

Booster-SHOT Boosting Stacked Homography Transformations for Multiview Pedestrian Detection With

EASUM Enhancing Affective State Understanding Through Joint Sentiment and Emotion

ReConPatch Contrastive Patch Representation Learning for Industrial Anomaly Detection

Adaptive Deep Neural Network Inference Optimization With EENet

Tunable Hybrid Proposal Networks for the Open Worl

Think Before You Simulate Symbolic Reasoning To Orchestrate Neural Computation

Learnable Cube-Based Video Encryption for Privacy-Preserving Action Recognition

Visually Guided Audio Source Separation With Meta Consistency Learning

Specular Object Reconstruction Behind Frosted Glass by Differentiable Rendering

Controlling Rate Distortion and Realism Towards a Single Comprehensive Neural

CCMR High Resolution Optical Flow Estimation via Coarse-To-Fine Context-Guided Motion

Stochastic Binary Network for Universal Domain Adaptation

M33D Learning 3D Priors Using Multi-Modal Masked Autoencoders for 2D

Composite Diffusion whole Sparts

Robust Unsupervised Domain Adaptation Through Negative-View Regularization

Designing a Hybrid Neural System To Learn Real-World Crack Segmentation

Text-to-Image Models for Counterfactual Explanations A Black-Box Approach

WaveMixSR Resource-Efficient Neural Network for Image Super-Resolution

EResFD Rediscovery of the Effectiveness of Standard Convolution for Lightweight

Training-Free Content Injection Using H-Space in Diffusion Models

USDN A Unified Sample-Wise Dynamic Network With Mixed-Precision and Early-Exit

Back to Optimization Diffusion-Based Zero-Shot 3D Human Pose Estimation

Neural Image Compression Using Masked Sparse Visual Representation

Army of Thieves Enhancing Black-Box Model Extraction via Ensemble Bas

iBARLE imBalance-Aware Room Layout Estimation

Unsupervised 3D Pose Estimation With Non-Rigid Structure-From-Motion Modeling

Semantic Labels-Aware Transformer Model for Searching Over a Large Collection

S3AD Semi-Supervised Small Apple Detection in Orchard Environments

High-Fidelity Pseudo-Labels for Boosting Weakly-Supervised Segmentation

Iterative Multi-Granular Image Editing Using Diffusion Models

ConfTrack Kalman Filter-Based Multi-Person Tracking by Utilizing Confidence Score o

SC-MIL Supervised Contrastive Multiple Instance Learning for Imbalanced Classification in

Improving Fairness in Deepfake Detection

Robust Feature Learning and Global Variance-Driven Classifier Alignment for Long-Tail

Intrinsic Hand Avatar Illumination-Aware Hand Appearance and Shape Reconstruction From

Critical Gap Between Generalization Error and Empirical Error in Activ

MetaSeg MetaFormer-Based Global Contexts-Aware Network for Efficient Semantic Segmentation

Privacy-Enhancing Person Re-Identification Framework - A Dual-Stage Approach

ShadowSense Unsupervised Domain Adaptation and Feature Fusion for Shadow-Agnostic T

HaGRID -- HAnd Gesture Recognition Image Dataset

Deep Visual-Genetic Biometrics for Taxonomic Classification of Rare Species

The Background Also Matters Background-Aware Motion-Guided Objects Discovery

Real-Time Weakly Supervised Video Anomaly Detection

AvatarOne Monocular 3D Human Animation

Synergizing Contrastive Learning and Optimal Transport for 3D Point Clou

Label Augmentation As Inter-Class Data Augmentation for Conditional Image Synthesis

Revisiting Latent Space of GAN Inversion for Robust Real Imag

Soft Curriculum for Learning Conditional GANs With Noisy-Labeled and Uncurat

MIDAS Mixing Ambiguous Data With Soft Labels for Dynamic Facial

INCODE Implicit Neural Conditioning With Prior Knowledge Embeddings

Robust TRISO-Fueled Pebble Identification by Digit Recognition

Leveraging Synthetic Data To Learn Video Stabilization Under Adverse Conditions

Estimating Blood Alcohol Level Through Facial Features for Driver Impairment

Graph Neural Networks for End-to-End Information Extraction From Handwritten Documents

A Hybrid Graph Network for Complex Activity Detection in Video

CamoFocus Enhancing Camouflage Object Detection With Split-Feature Focal Modulation an

Spectroformer Multi-Domain Query Cascaded Transformer Network for Underwater Image Enhancement

Lightweight Delivery Detection on Doorbell Cameras

Improving Normalization With the James-Stein Estimato

Adaptive Latent Diffusion Model for 3D Medical Image to Imag

A Atrous Spatial Temporal Action Recognition for Real Time Applications

Controllable Text-to-Image Synthesis for Multi-Modality MR Images

Efficient Semantic Matching With Hypercolumn Correlation

Enhancing Diverse Intra-Identity Representation for Visible-Infrared Person Re-Identification

Exploring Adversarial Robustness of Vision Transformers in the Spectral Perspectiv

Human Motion Aware Text-to-Video Generation With Explicit Camera Control

Implicit Neural Image Stitching With Enhanced and Blended Feature Reconstruction

Learning Residual Elastic Warps for Image Stitching Under Dirichlet Boundary

LensNeRF Rethinking Volume Rendering Based on Thin-Lens Camera Model

MICS Midpoint Interpolation To Learn Compact and Separated Representations fo

Offline-to-Online Knowledge Distillation for Video Instance Segmentation

Randomized Adversarial Style Perturbations for Domain Generalization

Token Fusion Bridging the Gap Between Token Pruning and Token

Out-of-Distribution Detection With Logical Reasoning

Masked Event Modeling Self-Supervised Pretraining for Event Cameras

Spatio-Temporal Filter Analysis Improves 3D-CNN for Action Classification

SGRec3D Self-Supervised 3D Scene Graph Learning via Object-Level Scene Reconstruction

RecycleNet Latent Feature Recycling Leads to Iterative Decision Refinement

Multi-Class Segmentation From Aerial Views Using Recursive Noise Diffusion

Top-Down Beats Bottom-Up in 3D Instance Segmentation

SimA Simple Softmax-Free Attention for Vision Transformers

ZIGNeRF Zero-Shot 3D Scene Representation With Invertible Generative Neural Radianc

MAELi Masked Autoencoder for Large-Scale LiDAR Point Clouds

Image Denoising and the Generative Accumulation of Photons

ATS Adaptive Temperature Scaling for Enhancing Out-of-Distribution Detection Methods

AU-Aware Dynamic 3D Face Reconstruction From Videos With Transform

Textron Weakly Supervised Multilingual Text Detection Through Data Programming

C2AIR Consolidated Compact Aerial Image Haze Removal

Learning to Detour Shortcut Mitigating Augmentation for Weakly Supervised Semantic

Self-Supervised Learning of Semantic Correspondence Using Web Videos

A Generative Multi-Resolution Pyramid and Normal-Conditioning 3D Cloth Draping

Empowering Unsupervised Domain Adaptation With Large-Scale Pre-Trained Vision-Language Models

Gradient-Guided Knowledge Distillation for Object Detectors

Fast Diffusion EM A Diffusion Model for Blind Inverse Problems

ENTED Enhanced Neural Texture Extraction and Distribution for Reference-Based Blin

Label-Free Synthetic Pretraining of Object Detectors

Adaptive Manifold for Imbalanced Transductive Few-Shot Learning

GLAD Global-Local View Alignment and Background Debiasing for Unsupervised Video

Hard Sample-Aware Consistency for Low-Resolution Facial Expression Recognition

HELA-VFA A Hellinger Distance-Attention-Based Feature Aggregation Network for Few-Shot Classification

Meta-Learned Kernel for Blind Super-Resolution Kernel Estimation

PIDiffu Pixel-Aligned Diffusion Model for High-Fidelity Clothed Human Reconstruction

PoseDiff Pose-Conditioned Multimodal Diffusion Model for Unbounded Scene Synthesis From

Pruning From Scratch via Shared Pruning Module and Nuclear Norm-Bas

RADIO Reference-Agnostic Dubbing Video Synthesis

Re-VoxelDet Rethinking Neck and Head Architectures for High-Performance Voxel-Based 3D

Real-Time User-Guided Adaptive Colorization With Vision Transform

Semi-Supervised Scene Change Detection by Distillation From Feature-Metric Alignment

Sharp-NeRF Grid-Based Fast Deblurring Neural Radiance Fields Using Sharpness Prio

UGPNet Universal Generative Prior for Image Restoration

UNSPAT Uncertainty-Guided SpatioTemporal Transformer for 3D Human Pose and Sh

Self-Sampling Meta SAM Enhancing Few-Shot Medical Image Segmentation With Meta-Learning

Learning to Read Analog Gauges from Synthetic Dat

Linking Convolutional Kernel Size to Generalization Bias in Face Analysis

Multi-View 3D Object Reconstruction and Uncertainty Modelling With Neural Sh

Progressive Hypothesis Transformer for 3D Human Mesh Recovery

CAMOT Camera Angle-Aware Multi-Object Tracking

MetaVers Meta-Learned Versatile Representations for Personalized Federated Learning

Common Diffusion Noise Schedules and Sample Steps Are Flaw

Ego2HandsPose A Dataset for Egocentric Two-Hand 3D Global Pose Estimation

FastSR-NeRF Improving NeRF Efficiency on Consumer Devices With a Simpl

MPT Mesh Pre-Training With Transformers for Human Pose and Mesh

Restoring Degraded Old Films With Recursive Recurrent Transformer Networks

Spiking Neural Networks for Active Time-Resolved SPAD Imaging

Annotation-Free Audio-Visual Segmentation

Bi-Directional Training for Composed Image Retrieval via Text Prompt Learning

BPKD Boundary Privileged Knowledge Distillation for Semantic Segmentation

Detecting Content Segments From Online Sports Streaming Events Challenges an

Dynamic Token-Pass Transformers for Semantic Segmentation

Efficient Feature Distillation for Zero-Shot Annotation Object Detection

FarSight A Physics-Driven Whole-Body Biometric System at Large Distance an

Generation of Upright Panoramic Image From Non-Upright Panoramic Imag

Global Occlusion-Aware Transformer for Robust Stereo Matching

LatentDR Improving Model Generalization Through Sample-Aware Latent Degradation and Restoration

Let the Beat Follow You - Creating Interactive Drum Sounds

Rethinking Knowledge Distillation With Raw Features for Semantic Segmentation

Revisiting Token Pruning for Object Detection and Instance Segmentation

Tackling Data Bias in MUSIC-AVQA Crafting a Balanced Dataset fo

U3DS3 Unsupervised 3D Semantic Scene Segmentation

Wakening Past Concepts Without Past Data Class-Incremental Learning From Onlin

Bridging the Gap Between Multi-Focus and Multi-Modal A Focused Integration

Controlling Character Motions Without Observable Driving Sourc

Controlling Virtual Try-On Pipeline Through Rendering Policies

CPSeg Finer-Grained Image Semantic Segmentation via Chain-of-Thought Language Prompting

Disentangled Pre-Training for Image Matting

Efficient Layout-Guided Image Inpainting for Mobile Us

Enforcing Sparsity on Latent Space for Robust and Explainable Representations

Mitigate Domain Shift by Primary-Auxiliary Objectives Association for Generalizing Person

Neural Style Protection Counteracting Unauthorized Neural Style Trans

OTAS Unsupervised Boundary Detection for Object-Centric Temporal Action Segmentation

PromptAD Zero-Shot Anomaly Detection Using Text Prompts

Repetitive Action Counting With Motion Feature Learning

Robust Source-Free Domain Adaptation for Fundus Image Segmentation

SDNet An Extremely Efficient Portrait Matting Model via Self-Distillation

Steering Prototypes With Prompt-Tuning for Rehearsal-Free Continual Learning

Task-Oriented Human-Object Interactions Generation With Implicit Neural Representations

TCP Triplet Contrastive-Relationship Preserving for Class-Incremental Learning

Video Instance Matting

VMFormer End-to-End Video Matting With Transform

A Neural Height-Map Approach for the Binocular Photometric Stereo Problem

Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis

PlantPlotGAN A Physics-Informed Generative Adversarial Network for Plant Disease Prediction

Differentially Private Video Activity Recognition

Zero-Shot Video Moment Retrieval From Frozen Vision-Language Models

Deblur-NSFF Neural Scene Flow Fields for Blurry Dynamic Scenes

Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation

SBCFormer Lightweight Network Capable of Full-Size ImageNet Classification at 1

SLoSH Set Locality Sensitive Hashing via Sliced-Wasserstein Embeddings

Towards Visual Saliency Explanations of Face Verification

Modality-Aware Representation Learning for Zero-Shot Sketch-Based Image Retrieval

Uncertainty-Weighted Loss Functions for Improved Adversarial Attacks on Semantic Segmentation

CL-MAE Curriculum-Learned Masked Autoencoders

Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation

SSVOD Semi-Supervised Video Object Detection With Sparse Annotations

OE-CTST Outlier-Embedded Cross Temporal Scale Transformer for Weakly-Supervised Video Anomaly

LIVENet A Novel Network for Real-World Low-Light Image Denoising an

Taming Normalizing Flows

One Style Is All You Need To Generate a Video

Mini but Mighty Finetuning ViTs With Mini Adapters

MonoProb Self-Supervised Monocular Depth Estimation With Interpretable Uncertainty

Universal Test-Time Adaptation Through Weight Ensembling Diversity Weighting and Prio

Training-Based Model Refinement and Representation Disagreement for Semi-Supervised Object Detection

Object Aware Contrastive Prior for Interactive Image Segmentation

Indoor Visual Localization Using Point and Line Correspondences in Dens

A Geometry Loss Combination for 3D Human Pose Estimation

Beyond SOT Tracking Multiple Generic Objects at Onc

Learning Low-Rank Latent Spaces With Simple Deterministic Autoencoder Theoretical an

CVTHead One-Shot Controllable Head Avatar With Vertex-Feature Transform

MACP Efficient Model Adaptation for Cooperative Perception

Joint 3D Shape and Motion Estimation From Rolling Shutter Light-Fiel

HalluciDet Hallucinating RGB Modality for Person Detection Through Privileged Information

Context-Based Interpretable Spatio-Temporal Graph Convolutional Network for Human Motion Forecasting

Stereo Conversion With Disparity-Aware Warping Compositing and Inpainting

MotionAGFormer Enhancing 3D Human Pose Estimation With a Transformer-GCNFormer Network

On the Fly Neural Style Smoothing for Risk-Averse Domain Generalization

HyperMix Out-of-Distribution Detection and Classification in Few-Shot Settings

Latent Feature-Guided Diffusion Models for Shadow Removal

Fixing Overconfidence in Dynamic Neural Networks

Increasing Biases Can Be More Efficient Than Increasing Weights

Hyperbolic vs Euclidean Embeddings in Few-Shot Learning Two Sides o

Wino Vidi Vici Conquering Numerical Instability of 8-Bit Winograd Convolution

Bag of Tricks for Fully Test-Time Adaptation

Diff2Lip Audio Conditioned Diffusion Models for Lip-Synchronization

Small Objects Matters in Weakly-Supervised Semantic Segmentation

Prompting Classes Exploring the Power of Prompt Class Learning in

Self-Supervised Learning for Place Representation Generalization Across Appearance Changes

Interactive Segmentation for Diverse Gesture Types Without Context

CAD - Contextual Multi-Modal Alignment for Dynamic AVQ

SEMA Semantic Attention for Capturing Long-Range Dependencies in Egocentric Lifelogs

PatchRefineNet Improving Binary Segmentation by Incorporating Signals From Optimal Patch-Wis

BigSmall Efficient Multi-Task Learning for Disparate Spatial and Temporal Physiological

Reverse Knowledge Distillation Training a Large Model Using a Small

TAMPAR Visual Tampering Detection for Parcel Logistics in Postal Supply

Implicit Neural Representation for Change Detection

Diverse Imagenet Models Transfer Bett

MFT Long-Term Tracking of Every Pixel

ICF-SRSR Invertible Scale-Conditional Function for Self-Supervised Real-World Single Image Super-Resolution

Contrastive Viewpoint-Aware Shape Learning for Long-Term Person Re-Identification

Debiasing Calibrating and Improving Semi-Supervised Learning Performance via Simple Ensembl

Diffusion in the Dark A Diffusion Model for Low-Light Text

Domain Generalisation via Risk Distribution Matching

FocusTune Tuning Visual Localization Through Focus-Guided Sampling

Robust Learning via Conditional Prevalence Adjustment

SequenceMatch Revisiting the Design of Weak-Strong Augmentations for Semi-Supervised Learning

VideoFACT Detecting Video Forgeries Using Attention Scene Context and Forensic

MoP-CLIP A Mixture of Prompt-Tuned CLIP Models for Domain Incremental

Generalization by Adaptation Diffusion-Based Domain Extension for Domain-Generalized Semantic Segmentation

Triplet Attention Transformer for Spatiotemporal Predictive Learning

HashReID Dynamic Network With Binary Codes for Efficient Person Re-Identification

Effective Restoration of Source Knowledge in Continual Test Time Adaptation

3D-Aware Talking-Head Video Motion Trans

Scene Text Image Super-Resolution Based on Text-Conditional Diffusion Models

Prototypical Contrastive Network for Imbalanced Aerial Image Segmentation

StyleGenes Discrete and Efficient Latent Distributions for GANs

Automated Monitoring of Ear Biting in Pigs by Tracking Individuals

Defending Object Detection Models Against Image Distortions

FRoG-MOT Fast and Robust Generic Multiple-Object Tracking by IoU an

DiffBody Diffusion-Based Pose and Shape Editing of Human Images

Unsupervised Domain Adaptation of MRI Skull-Stripping Trained on Adult Dat

Guided Cluster Aggregation A Hierarchical Approach to Generalized Category Discovery

MarsLS-Net Martian Landslides Segmentation Network and Benchmark Dataset

Domain Adaptive 3D Shape Retrieval From Monocular Images

Exploring the Impact of Rendering Method and Motion Quality on

Learning Visual Body-Shape-Aware Embeddings for Fashion Compatibility

Revisiting Pixel-Level Contrastive Pre-Training on Scene Images

Synthesizing Coherent Story With Auto-Regressive Latent Diffusion Models

Zero-Shot Building Attribute Extraction From Large-Scale Vision and Language Models

Can CLIP Help Sound Source Localization

Fully-Automatic Reflection Removal for 360-Degree Images

Grafting Vision Transformers

Hard-Label Based Small Query Black-Box Adversarial Attack

Layer-Wise Auto-Weighting for Non-Stationary Test-Time Adaptation

Localization and Manipulation of Immoral Visual Cues for Safe Text-to-Imag

Point-DynRF Point-Based Dynamic Radiance Fields From a Monocular Video

Shape-Guided Diffusion With Inside-Outside Attention

CrashCar101 Procedural Generation for Damage Assessment

Motion Matters Neural Motion Transfer for Better Camera Physiological Measurement

PHG-Net Persistent Homology Guided Medical Image Classification

CGAPoseNetGCAN A Geometric Clifford Algebra Network for Geometry-Aware Camera Pos

StyleAvatar Stylizing Animatable Head Avatars

An Analysis of Initial Training Strategies for Exemplar-Free Class-Incremental Learning

Simple Token-Level Confidence Improves Caption Correctness

Embedding Task Structure for Action Detection

Frequency Attention for Knowledge Distillation

I-AI A Controllable Interpretable AI System for Decoding Radiologists

LP-OVOD Open-Vocabulary Object Detection by Linear Probing

MixtureGrowth Growing Neural Networks by Recombining Learned Parameters

NVAutoNet Fast and Accurate 360deg 3D Visual Perception for Sel

Fast and Interpretable Face Identification for Out-of-Distribution Data Using Vision

ZEETAD Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action

Multi-Level Attention Aggregation for Aesthetic Face Relighting

Beyond Classification Definition and Density-Based Estimation of Calibration in Object

Towards Accurate Disease Segmentation in Plant Images A Comprehensive Dataset

ConeQuest A Benchmark for Cone Segmentation on Mars

DISCO Distributed Inference With Sparse Communications

Revolutionize the Oceanic Drone RGB Imagery With Pioneering Sun Glint

Shape-Biased CNNs Are Not Always Superior in Out-of-Distribution Robustness

Design Choices for Enhancing Noisy Student Self-Training

Vision Transformer for Multispectral Satellite Imagery Advancing Landcover Classification

Online Class-Incremental Learning for Real-World Food Image Classification

ENIGMA-51 Towards a Fine-Grained Understanding of Human Behavior in Industrial

G-CASCADE Efficient Cascaded Graph Convolutional Decoding for 2D Medical Imag

MIST Medical Image Segmentation Transformer With Convolutional Attention Mixing CAM

Semi-Supervised Semantic Depth Estimation Using Symbiotic Transformer and NearFarMix Augmentation

Image Labels Are All You Need for Coarse Seagrass Segmentation

Towards Realistic Generative 3D Face Models

Fingervein Verification Using Convolutional Multi-Head Attention Network

Multispectral Imaging for Differential Face Morphing Attack Detection A Preliminary

Source-Guided Similarity Preservation for Online Person Re-Identification

Continual Atlas-Based Segmentation of Prostate MRI

Activity-Based Early Autism Diagnosis Using a Multi-Dataset Supervised Contrastive Learning

MaskConver Revisiting Pure Convolution Model for Panoptic Segmentation

Attention-Guided Prototype Mixing Diversifying Minority Context on Imbalanced Whole Sli

GC-VTON Predicting Globally Consistent and Occlusion Aware Local Flows With

MOPA Modular Object Navigation With PointGoal Agents

Towards Domain-Aware Knowledge Distillation for Continual Model Generalization

Differentiable JPEG The Devil Is in the Details

Content-Aware Image Color Editing With Auxiliary Color Restoration Tasks

MuSHRoom Multi-Sensor Hybrid Room Dataset for Joint 3D Reconstruction an

Segment Anything From Spac

VEATIC Video-Based Emotion and Affect Tracking in Context Dataset

Salient Object Detection for Images Taken by People With Vision

MotionGPT Human Motion Synthesis With Improved Diversity and Realism vi

Recognition of Unseen Bird Species by Learning From Field Guides

Effects of Markers in Training Datasets on the Accuracy o

Time To Shine Fine-Tuning Object Detection Models With Synthetic Advers

FuseCap Leveraging Large Language Models for Enriched Fused Image Captions

ClipSitu Effectively Leveraging CLIP for Conditional Predictions in Situation Recognition

Efficient Expansion and Gradient Based Task Inference for Replay F

Interaction Region Visual Transformer for Egocentric Action Anticipation

Describe Images in a Boring Way Towards Cross-Modal Sarcasm Generation

TriCoLo Trimodal Contrastive Loss for Text To Shape Retrieval

On the Importance of Large Objects in CNN Based Object

Rank2Tell A Multimodal Driving Dataset for Joint Importance Ranking an

Leveraging Task-Specific Pre-Training To Reason Across Images and Videos

Auto-BPA An Enhanced Ball-Pivoting Algorithm With Adaptive Radius Using Contextual

MAdVerse A Hierarchical Dataset of Multi-Lingual Ads From Diverse Sources

Enhancing Multimodal Compositional Reasoning of Visual Language Models With Generativ

POP-VQA - Privacy Preserving On-Device Personalized Visual Question Answering

SICKLE A Multi-Sensor Satellite Imagery Dataset Annotated With Multiple Key

On Manipulating Scene Text in the Wild With Diffusion Models

Aligning Non-Causal Factors for Transformer-Based Source-Free Domain Adaptation

A Visual Active Search Framework for Geospatial Exploration

Benchmark Generation Framework With Customizable Distortions for Image Classifier Robustness

Open-Set Object Detection by Aligning Known Class Representations

Collage Diffusion

BirdSAT Cross-View Contrastive Masked Autoencoders for Bird Species Classification an

Edge Inference With Fully Differentiable Quantized Mixed Precision Neural Networks

Detection Defenses An Empty Promise Against Adversarial Patch Attacks on

IndustReal A Dataset for Procedure Step Recognition Handling Execution Errors

Identifying Label Errors in Object Detection Datasets by Loss Inspection

OOD Aware Supervised Contrastive Learning

REALM Robust Entropy Adaptive Loss Minimization for Improved Single-Sample Test-Tim

Ordinal Classification With Distance Regularization for Robust Brain Age Prediction

IDD-AW A Benchmark for Safe and Robust Segmentation of Driv

RIMeshGNN A Rotation-Invariant Graph Neural Network for Mesh Classification

Improved Topological Preservation in 3D Axon Segmentation and Centerline Detection

Analyzing the Domain Shift Immunity of Deep Homography Estimation

Assist Is Just As Important as the Goal Image Resurfacing

Favoring One Among Equals - Not a Good Idea Many-to-On

CXR-IRGen An Integrated Vision and Language Model for the Generation

DiffCLIP Leveraging Stable Diffusion for Language Grounded 3D Classification

Med-DANet V2 A Flexible Dynamic Architecture for Efficient Medical Volumetric

Multitask Vision-Language Prompt Tuning

Towards Diverse and Consistent Typography Generation

Video-kMaX A Simple Unified Approach for Online and Near-Online Video

Egocentric Action Recognition by Capturing Hand-Object Contact and Object Stat

Benchmarking Out-of-Distribution Detection in Visual Question Answering

Conditional Velocity Score Estimation for Image Restoration

Few-Shot Shape Recognition by Learning Deep Shape-Aware Features

Training-Free Object Counting With Prompts

Have We Ever Encountered This Before Retrieving Out-of-Distribution Road Obstacles

Asymmetric Image Retrieval With Cross Model Compatible Ensembles

FPGAN-Control A Controllable Fingerprint Generator for Training With Synthetic Dat

ArcGeo Localizing Limited Field-of-View Images Using Cross-View Matching

Opinion Unaware Image Quality Assessment via Adversarial Convolutional Variational Autoenco

Vikriti-ID A Novel Approach for Real Looking Fingerprint Data-Set Generation

Deep Plug-and-Play Nighttime Non-Blind Deblurring With Saturated Pixel Handling Schemes

Joint Depth Prediction and Semantic Segmentation With Multi-View SAM

Lightweight Thermal Super-Resolution and Object Detection for Robust Perception in

PAIR Perception Aided Image Restoration for Natural Driving Conditions

Brainomaly Unsupervised Neurologic Disease Detection Utilizing Unannotated T1-Weighted Brain MR

Uncertainty Estimation in Instance Segmentation With Star-Convex Shapes

LipAT Beyond Style Transfer for Controllable Neural Simulation of Lipstick

Discriminator-Free Unsupervised Domain Adaptation for Multi-Label Image Classification

Learning Robust Deep Visual Representations From EEG Brain Recordings

SynthProv Interpretable Framework for Profiling Identity Leakag

Data-Centric Debugging Mitigating Model Failures via Targeted Image Retrieval

Hardware Aware Evolutionary Neural Architecture Search Using Representation Similarity Metric

Deep Image Fingerprint Towards Low Budget Synthetic Image Detection an

Gradient Coreset for Federated Learning

Computer Vision on the Edge Individual Cattle Identification in Real-Tim

MSCC Multi-Scale Transformers for Camera Calibration

Overcoming Catastrophic Forgetting for Multi-Label Class-Incremental Learning

StyleGAN-Fusion Diffusion Guided Domain Adaptation of Image Generators

SyntheWorld A Large-Scale Synthetic Dataset for Land Cover Mapping an

Single-Image Deblurring Trajectory and Shape Recovery of Fast Moving Objects

Visual Narratives Large-Scale Hierarchical Classification of Art-Historical Images

pSTarC Pseudo Source Guided Target Clustering for Fully Test-Time Adaptation

Learning Generalizable Perceptual Representations for Data-Efficient No-Reference Image Quality Assessment

OmniVec Learning Robust Representations With Cross Modal Sharing

Holistic Representation Learning for Multitask Trajectory Anomaly Detection

Training Ensembles With Inliers and Outliers for Semi-Supervised Active Learning

Diffused Heads Diffusion Models Beat GANs on Talking-Face Generation

A Closer Look at Robustness of Vision Transformers to Backdoo

Diffuse and Restore A Region-Adaptive Diffusion Model for Identity-Preserving Blin

LaughTalk Expressive 3D Talking Head Generation With Laught

Defense Against Adversarial Cloud Attack on Remote Sensing Salient Object

Improved Techniques for Quantizing Deep Networks With Adaptive Bit-Widths

NeRFEditor Differentiable Style Decomposition for 3D Scene Editing

Rethinking Visibility in Human Pose Estimation Occluded Pose Reasoning vi

RSMPNet Relationship Guided Semantic Map Prediction

Towards Better Structured Pruning Saliency by Reorganizing Convolution

FastCLIPstyler Optimisation-Free Text-Based Image Style Transfer Using Style Representations

GRIT GAN Residuals for Paired Image-to-Image Translation

Face Identity-Aware Disentanglement in StyleGAN

Adapt Your Teacher Improving Knowledge Distillation for Exemplar-Free Continual Learning

Few-Shot Event Classification in Images Using Knowledge Graphs for Prompting

Diffusion Models Meet Image Counter-Forensics

Active Transfer Learning for Efficient Video-Specific Human Pose Estimation

Appearance-Based Curriculum for Semi-Supervised Learning With Multi-Angle Unlabeled Dat

Kaizen Practical Self-Supervised Continual Learning With Continual Fine-Tuning

Semantic-Aware Video Representation for Few-Shot Action Recognition

Discovering and Mitigating Biases in CLIP-Based Image Editing

Weakly-Supervised Deepfake Localization in Diffusion-Generated Images

Cross-Domain Few-Shot Incremental Learning for Point-Cloud Recognition

SciOL and MuLMS-Img Introducing a Large-Scale Multimodal Scientific Dataset an

PrivObfNet A Weakly Supervised Semantic Segmentation Model for Data Protection

RGB-D Mapping and Tracking in a Plenoxel Radiance Fiel

Complementary-Contradictory Feature Regularization Against Multimodal Overfitting

360BEV Panoramic Semantic Mapping for Indoor Birds-Eye View

Learning To Compose SuperWeights for Neural Parameter Allocation Search

Lets Observe Them Over Time An Improved Pedestrian Attribute Recognition

GraphGraph A Nested Graph-Based Framework for Early Accident Anticipation

Leveraging Next-Active Objects for Context-Aware Anticipation in Egocentric Videos

C-CLIP Contrastive Image-Text Encoders To Close the Descriptive-Commentative G

Object Re-Identification From Point Clouds

Using Early Readouts To Mediate Featural Bias in Distillation

Permutation-Aware Activity Segmentation via Unsupervised Frame-To-Segment Alignment

PointCT Point Central Transformer Network for Weakly-Supervised Point Cloud Semantic

3D Super-Resolution Model for Vehicle Flow Field Enrichment

Query-Guided Attention in Vision Transformers for Localizing Objects Using

Arbitrary-Resolution and Arbitrary-Scale Face Super-Resolution With Implicit Representation Networks

2D Feature Distillation for Weakly- and Semi-Supervised 3D Semantic Segmentation

Occlusion Sensitivity Analysis With Augmentation Subspace Perturbation in Deep Featu

Controllable Image Synthesis of Industrial Data Using Stable Diffusion

Landeghem Beyond Document Page Classification Design Datasets and Challenges

Rozendaal MobileNVC Real-Time 1080p Neural Video Compression on a Mobile Devic

GC-MVSNet Multi-View Multi-Scale Geometrically-Consistent Multi-View Stereo

Evaluation of Video Masked Autoencoders Performance and Uncertainty Estimations fo

Causal Feature Alignment Learning To Ignore Spurious Background Features

Can You Even Tell Left From Right Presenting a New

CoD Coherent Detection of Entities From Images With Multiple Modalities

GraphFill Deep Image Inpainting Using Graphs

Meta-Learned Attribute Self-Interaction Network for Continual and Generalized Zero-Shot Learning

e Silva Attention Modules Improve Image-Level Anomaly Detection for Industrial Inspection

TEGLO High Fidelity Canonical Texture Mapping From Single-View Images

Toward Planet-Wide Traffic Camera Calibration

Fine-Grained Alignment for Cross-Modal Recipe Retrieval

Improving Open-Set Semi-Supervised Learning With Self-Supervision

3D Human Pose Estimation With Two-Step Mixed-Training Strategy

Continual Test-Time Domain Adaptation via Dynamic Sample Selection

Customizing 360-Degree Panoramas Through Text-to-Image Diffusion Models

Distortion-Disentangled Contrastive Learning

Efficient Transferability Assessment for Selection of Pre-Trained Detectors

FreMIM Fourier Transform Meets Masked Image Modeling for Medical Imag

GazeGNN A Gaze-Guided Graph Neural Network for Chest X-Ray Classification

Hyb-NeRF A Multiresolution Hybrid Encoding for Neural Radiance Fields

Improving the Effectiveness of Deep Generative Dat

Learning Quality Labels for Robust Image Classification

Maximum Knowledge Orthogonality Reconstruction With Gradients in Federated Learning

Multimodality-Guided Image Style Transfer Using Cross-Modal GAN Inversion

Neural Textured Deformable Meshes for Robust Analysis-by-Synthesis

Painterly Image Harmonization via Adversarial Residual Learning

RS2G Data-Driven Scene-Graph Extraction and Embedding for Robust Autonomous Perception

Self-Annotated 3D Geometric Learning for Smeared Points Removal

Sparse Convolutional Networks for Surface Reconstruction From Noisy Point Clouds

TSP-Transformer Task-Specific Prompts Boosted Transformer for Holistic Scene Understanding

VCISR Blind Single Image Super-Resolution With Video Compression Synthetic Dat

Density-Based Flow Mask Integration via Deformable Convolution for Video Peopl

Exploiting CLIP for Zero-Shot HOI Detection Requires Knowledge Distillation at

Interpretable Object Recognition by Semantic Prototype Analysis

Constrained Probabilistic Mask Learning for Task-Specific Undersampled MRI Reconstruction

Approximating Intersections and Differences Between Linear Statistical Shape Models Using

FATE Feature-Agnostic Transformer-Based Encoder for Learning Generalized Embedding Spaces in

HAMMER Learning Entropy Maps To Create Accurate 3D Models in

Best of Both Worlds Learning Arbitrary-Scale Blind Super-Resolution via Dual

From Denoising Training To Test-Time Adaptation Enhancing Domain Generalization fo

Second-Order Graph ODEs for Multi-Agent Trajectory Forecasting

The Growing Strawberries Dataset Tracking Multiple Objects With Biological Development

Gradual Source Domain Expansion for Unsupervised Domain Adaptation

Camera-Independent Single Image Depth Estimation From Defocus Blu

Link Prediction for Flow-Driven Spatial Networks

ECSIC Epipolar Cross Attention for Stereo Image Compression

Sketch-Based Video Object Localization

A Robust Diffusion Modeling Framework for Radar Camera 3D Object

Correlation-Aware Active Learning for Surgery Video Segmentation

HD-Fusion Detailed Text-to-3D Generation Leveraging Multiple Noise Estimation

Learning Better Keypoints for Multi-Object 6DoF Pose Estimation

Masked Collaborative Contrast for Weakly Supervised Semantic Segmentation

MIVC Multiple Instance Visual Component for Visual-Language Models

RPCANet Deep Unfolding RPCA Based Infrared Small Target Detection

CLIP-DIY CLIP Dense Inference Yields Open-Vocabulary Semantic Segmentation For-F

MITFAS Mutual Information Based Temporal Feature Alignment and Sampling fo

PMI Sampler Patch Similarity Guided Frame Selection for Aerial Action

TSA2 Temporal Segment Adaptation and Aggregation for Video Harmonization

DREAM Visual Decoding From Reversing Human Visual System

Beyond Fusion Modality Hallucination-Based Multispectral Fusion for Pedestrian Detection

SAM Fewshot Finetuning for Anatomical Segmentation in Medical Images

Sign Language Production With Latent Motion Transform

Glance To Count Learning To Rank With Anchors for Weakly-Supervis

HDMNet A Hierarchical Matching Network With Double Attention for Large-Scal

DPPMask Masked Image Modeling With Determinantal Point Processes

GIPCOL Graph-Injected Soft Prompting for Compositional Zero-Shot Learning

GTP-ViT Efficient Vision Transformers via Graph-Based Token Propagation

Personalized Face Inpainting With Diffusion Models by Parallel Visual Attention

Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing

Self-Supervised Edge Detection Reconstruction for Topology-Informed 3D Axon Segmentation an

Self-Supervised Relation Alignment for Scene Graph Generation

SpectralCLIP Preventing Artifacts in Text-Guided Style Transfer From a Spectral

Active Learning for Single-Stage Object Detection in UAV Images

Convolutional Masked Image Modeling for Dense Prediction Tasks on Pathology

Foundation Model Assisted Weakly Supervised Semantic Segmentation

Improving Vision-and-Language Reasoning via Spatial Relations Modeling

Latent-Guided Exemplar-Based Image Re-Colorization

MGM-AE Self-Supervised Learning on 3D Shape Using Mesh Graph Mask

PolyMaX General Dense Prediction With Mask Transform

Robust Category-Level 3D Pose Estimation From Diffusion-Enhanced Synthetic Dat

SCoRD Subject-Conditional Relation Detection With Text-Augmented Dat

SimpliMix A Simplified Manifold Mixup for Few-Shot Point Cloud Classification

AFTer-SAM Adapting SAM With Axial Fusion Transformer for Medical Imaging

Universal Semi-Supervised Model Adaptation via Collaborative Consistency Training

Group-Wise Contrastive Bottleneck for Weakly-Supervised Visual Representation Learning

3SD Self-Supervised Saliency Detection With No Labels

Self-Supervised Denoising Transformer With Gaussian Process

Minimizing Layerwise Activation Norm Improves Generalization in Federated Learning

PathLDM Text Conditioned Latent Diffusion Model for Histopathology

Concurrent Band Selection and Traversability Estimation From Long-Wave Hyperspectral Imagery

FIRe Fast Inverse Rendering Using Directional and Signed Distance Functions

Label Shift Estimation for Class-Imbalance Problem A Bayesian Approach

LAVSS Location-Guided Audio-Visual Spatial Audio Separation

Unsupervised Exemplar-Based Image-to-Image Translation and Cascaded Vision Transformers for Tagg

FG-Net Facial Action Unit Detection With Generalizable Pyramidal Features

Augment the Pairs Semantics-Preserving Image-Caption Pair Augmentation for Grounding-Based Vision

Optical Flow Domain Adaptation via Target Style Trans

Real-Time Polyp Detection in Colonoscopy Using Lightweight Transform

Cross-Attention Between Satellite and Ground Views for Enhanced Fine-Grained Robot

FAKD Feature Augmented Knowledge Distillation for Semantic Segmentation

Rethinking Multimodal Content Moderation From an Asymmetric Angle With Mixed-Modality

StreamMapNet Streaming Mapping Network for Vectorized Online HD Map Construction

Understanding Hyperbolic Metric Learning Through Hard Negative Sampling

Denoising and Selecting Pseudo-Heatmaps for Semi-Supervised Human Pose Estimation

DocReal Robust Document Dewarping of Real-Life Images via Attention-Enhanced Control

Evolve Enhancing Unsupervised Continual Learning With Multiple Experts

When 3D Bounding-Box Meets SAM Point Cloud Instance Segmentation With

Refine and Redistribute Multi-Domain Fusion and Dynamic Label Assignment fo

Cheating Depth Enhancing 3D Surface Anomaly Detection via Depth Simulation

Alleviating Foreground Sparsity for Semi-Supervised Monocular 3D Object Detection

Can Vision-Language Models Be a Good Guesser Exploring VLMs fo

Contextual Affinity Distillation for Image Anomaly Detection

D3GU Multi-Target Active Domain Adaptation via Enhancing Domain Alignment

DECDM Document Enhancement Using Cycle-Consistent Diffusion Models

Domain Generalization With Correlated Style Uncertainty

DR2 Disentangled Recurrent Representation Learning for Data-Efficient Speech Video Synthesis

Generated Distributions Are All You Need for Membership Inference Attacks

Handformer2T A Lightweight Regression-Based Model for Interacting Hands Pose Estimation

Improving the Fairness of the Min-Max Game in GANs Training

Improving the Leaking of Augmentations in Data-Efficient GANs via Adaptiv

Incorporating Physics Principles for Precise Human Motion Prediction

Instruct Me More Random Prompting for Visual In-Context Learning

Movie Genre Classification by Language Augmentation and Shot Sampling

Multimodal Channel-Mixing Channel and Spatial Masked AutoEncoder on Facial Action

Object-Centric Video Representation for Long-Term Action Anticipation

On the Quantification of Image Reconstruction Uncertainty Without Training Dat

Open-NeRF Towards Open Vocabulary NeRF Decomposition

Patch-Based Selection and Refinement for Early Object Detection

PGVT Pose-Guided Video Transformer for Fine-Grained Action Recognition

PMVC Promoting Multi-View Consistency for 3D Scene Reconstruction

Preserving Image Properties Through Initializations in Diffusion Models

Semantic Transfer From Head to Tail Enlarging Tail Margin fo

Sequential Transformer for End-to-End Video Text Detection

Text-to-Image Editing by Image Information Removal

WalkFormer Point Cloud Completion via Guided Walks

BALF Simple and Efficient Blur Aware Local Feature Detecto

Deep Subdomain Alignment for Cross-Domain Image Classification

Leveraging the Power of Data Augmentation for Transformer-Based Tracking

Polarimetric PatchMatch Multi-View Stereo

SemST Semantically Consistent Multi-Scale Image Translation via Structure-Texture Alignment

THInImg Cross-Modal Steganography for Presenting Talking Heads in Images

Unsupervised Domain Adaptation for Semantic Segmentation With Pseudo Label Self-Refinement

CAILA Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning

TPSeNCE Towards Artifact-Free Realistic Rain Generation for Deraining and Object

Lightweight Portrait Matting via Regional Attention and Refinement

4K-Resolution Photo Exposure Correction at 125 FPS With 8K Parameters

FELGA Unsupervised Fragment Embedding for Fine-Grained Cross-Modal Association

CATS Combined Activation and Temporal Suppression for Efficient Network Inferenc

Consistent Multimodal Generation via a Unified GAN Framework

ShARc Shape and Appearance Recognition for Person Identification In-the-Wil

SSP Semi-Signed Prioritized Neural Fitting for Surface Reconstruction From Unorient

Unsupervised Graphic Layout Grouping With Transformers

Multimodal Deep Learning for Remote Stress Estimation Using CCT-LSTM

Learning To Recognize Occluded and Small Objects With Partial Inputs

分类： CVPR导读标签：暂无标签

版权申明

本文系作者 @冯大仙原创发布在计算机视觉学习笔记站点。未经许可，禁止转载。

评论

暂无评论数据

暂无评论数据

获得点赞 0

文章被阅读 514

最后更新 24-02-29 14:40:43

目录

暂无目录

下一篇

A New Comprehensive Benchmark for Semi-supervised Video Anomaly Detection and Anticipation

目录