Computer vision
LiT (Locked-image Tuning) paper is neat. Trying to understand Vision Transformers. Kornia seems like a great library. Imagen is fascinating.
Links
- OpenCV - Open Source Computer Vision Library. (Web) (OpenCV Course)
- Gluon CV Toolkit - Provides implementations of the sate-of-the-art (SOTA) deep learning models in computer vision.
- Pythia - Modular framework for vision and language multimodal research. Built on top of PyTorch.
- video-object-removal - Just draw a bounding box and you can remove the object you want to remove.
- GoCV - Go package for computer vision using OpenCV 4 and beyond.
- Sandbox for training convolutional networks for computer vision
- Get started with Computer Vision, Deep Learning, and OpenCV
- TorchCV - PyTorch-Based Framework for Deep Learning in Computer Vision.
- AI Habitat - Flexible, high-performance 3D simulator for Embodied AI research.
- Kornia - Open Source Differentiable Computer Vision Library for PyTorch. (Web)
- Roboflow - Raw images to trained computer vision model. (Article)
- PySlowFast - Open source video understanding codebase from FAIR that provides state-of-the-art video classification models.
- How to Convert a Picture to Numbers
- Awesome Computer Vision
- The Ancient Secrets of Computer Vision (2018)
- Variational Methods for Computer Vision lectures (2013)
- Classy Vision - New end-to-end, PyTorch-based framework for large-scale training of state-of-the-art image and video classification models.
- Meshroom - 3D Reconstruction Software.
- AliceVision - Photogrammetric Computer Vision Framework. (Code) (GitHub)
- PyTorch3d - Provides efficient, reusable components for 3D Computer Vision research with PyTorch. (Web)
- Face Recognition - World's simplest facial recognition api for Python and the command line.
- Deep Hough Voting for 3D Object Detection in Point Clouds
- Point Cloud Library - Standalone, large scale, open project for 2D/3D image and point cloud processing.
- Disappearing-People - Removing people from complex backgrounds in real time using TensorFlow.js in the web browser. (HN)
- Best Practices, code samples, and documentation for Computer Vision
- Computer Vision Basics in Microsoft Excel
- PolyGen: An Autoregressive Generative Model of 3D Meshes (2020)
- Sophus - C++ implementation of Lie Groups using Eigen.
- SOLT - Streaming over lightweight data transformations.
- Awesome Interaction-aware Behavior and Trajectory Prediction
- SynSin: End-to-end View Synthesis from a Single Image (2020) (Code)
- Pixel2Mesh - Generating 3D Mesh Models from Single RGB Images.
- First Order Motion Model for Image Animation (Code)
- PyTorch improved version of TPAMI 2017 paper: Face Alignment in Full Pose Range: A 3D Total Solution
- Learning to See Through Obstructions
- Learning to Cluster Faces on an Affinity Graph (LTC)
- Avatarify - Avatars for Zoom and Skype.
- SPSR - PyTorch implementation of Structure-Preserving Super Resolution with Gradient Guidance.
- OISR-PyTorch - PyTorch implementation of "ODE-inspired Network Design for Single Image Super-Resolution.
- 3D Photography using Context-aware Layered Depth Inpainting
- CenterMask : Real-Time Anchor-Free Instance Segmentation
- Interview with Dmytro Mushkin | Computer Vision Research | Kaggle, ML & Education (2020)
- Pytorch code for ICLR-20 Paper "Learning to Explore using Active Neural SLAM"
- FaceTracker - Real time deformable face tracking in C++ with OpenCV 3.
- Awesome Super Resolution
- Adversarial Latent Autoencoders
- ElasticFusion - Real-time dense visual SLAM system capable of capturing comprehensive dense globally consistent surfel-based maps of room scale environments explored using an RGB-D camera.
- StegaStamp: Invisible Hyperlinks in Physical Photographs
- Pose Animator - Takes a 2D vector illustration and animates its containing curves in real-time based on the recognition result from PoseNet and FaceMesh. (HN)
- fvcore - Collection of common code that's shared among different research projects in FAIR computer vision team.
- Making Sense of Vision and Touch: Multimodal Representations for Contact-Rich Tasks (2020)
- ScreenPoint - Project an image centroid to another image using OpenCV.
- U^2-Net - Code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection".
- TorchIO - Tools for medical image processing in deep learning.
- Real time Image Animation in OpenCV using first order model (HN)
- OpenMV (Open-Source Machine Vision) - Aims at making machine vision more accessible to beginners by developing a user-friendly, open-source, low-cost machine vision platform.
- TSD - 1st place models in Google OpenImage Detection Challenge 2019.
- Training-Time-Friendly Network for Real-Time Object Detection
- Big Transfer (BiT): General Visual Representation Learning
- Fast Human Pose Estimation CVPR2019
- Deep High-Resolution Representation Learning for Human Pose Estimation
- Background Matting: The World is Your Green Screen
- DE⫶TR: End-to-End Object Detection with Transformers
- PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization
- Tracking Objects as Points
- VIBE - Video Inference for Human Body Pose and Shape Estimation.
- SRZoo - Integrated repository for super-resolution using deep learning.
- mAP (mean Average Precision) - Evaluates the performance of your neural net for object recognition.
- Neural Pose Transfer by Spatially Adaptive Instance Normalization (2020)
- Awesome Neural Rendering
- Learning To Classify Images Without Labels
- Deep Leakage From Gradients (2019)
- 3Dflow - Offers customized computer vision software solutions.
- labelme - Image Polygonal Annotation with Python.
- imgviz - Image Visualization Tools.
- Attention-Guided Hierarchical Structure Aggregation for Image Matting
- YOLOv5 Is Here: State-of-the-Art Object Detection at 140 FPS (2020) (HN) (Code)
- DetectoRS - Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution.
- PyTorch implementation of paper Real-time Facial Surface Geometry from Monocular Video on Mobile GPUs
- VirTex: Learning Visual Representations from Textual Annotations
- High-Resolution 3D Human Digitization from A Single Image
- FairMOT - Simple baseline for one-shot multi-object tracking.
- Implicit Neural Representations with Periodic Activation Functions (2020)
- MSeg: A Composite Dataset for Multi-Domain Segmentation
- Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results
- MMDetection - OpenMMLab Detection Toolbox and Benchmark.
- Fourier Feature Networks in TensorFlow 2
- Computer Vision Lab | ETH Zurich
- PyTorch Computer Vision Library for Experts and Beginners (2020)
- Computer Vision Pretrained Models
- Fawkes: Image “Cloaking” for Personal Privacy (HN)
- Motion - Software motion detector.
- Supervised 3D Mesh Reconstruction (2020)
- NeRF in the Wild - Neural Radiance Fields for Unconstrained Photo Collections.
- NASA: Neural Articulated Shape Approximation (2020)
- An Overview of Deep Learning Architectures in Few-Shot Learning Domain (2020)
- FutureMapping: The Computational Structure of Spatial AI Systems (2018) (Tweet)
- Optimal Peanut Butter and Banana Sandwiches (2020) (Twitter)
- Gesture Recognition with Line Integrals (Code)
- Computer Vision: Looking Back to Look Forward (2020)
- DAIN (Depth-Aware Video Frame Interpolation)
- Picsellia - Development platform dedicated to Computer Vision.
- Official implementation of "PifPaf: Composite Fields for Human Pose Estimation" in PyTorch
- Object Recognition with Gradient-Based Learning (1999)
- Imaginaire - NVIDIA PyTorch GAN library with distributed and mixed precision support. (Docs)
- DeepBackSub - Virtual Video Device for Background Replacement with Deep Semantic Segmentation.
- Awesome Tiny Object Detection
- Flow-edge Guided Video Completion
- 5 Things to look for in a Computer Vision startup job (2020)
- Transformers for Image Recognition at Scale (2020) (HN)
- nnU-Net - Segmentation method that is designed to deal with the dataset diversity.
- batchgenerators - Framework for data augmentation for 2D and 3D image classification and segmentation.
- Lookuq - App to create object detection projects without coding. (HN)
- InsightFace - Face Analysis Project on MXNet. (Web)
- PyTorch implementation of SwAV (Swapping Assignments between Views)
- Asymmetric Loss For Multi-Label Classification in PyTorch
- Antialiased CNNs - Making Convolutional Networks Shift-Invariant Again.
- Perceptual Similarity Metric and Dataset - Unreasonable Effectiveness of Deep Features as a Perceptual Metric.
- Deep Learning Anime Papers
- Vision Transformer - Models from the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.
- Handsfree.js - Wrapper library around computer vision models for working with face pointers, assistive tech, and creative expression. (Web)
- ZeroQ: A Novel Zero Shot Quantization Framework
- SqueezeNext - Contains the Caffe implementation of SqueezeNext.
- ANODE: Adjoint Based Neural ODEs
- Python Video Stabilization using OpenCV
- Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
- TorchCV - PyTorch vision library mimics ChainerCV.
- Vision Transformer in PyTorch
- MedicalTorch - Medical imaging framework for PyTorch. (Docs)
- imagecluster - Cluster images based on image content using a pre-trained deep neural network, optional time distance scaling and hierarchical clustering.
- Detecto - Build fully-functioning computer vision models with PyTorch. (Docs)
- EmoPy - Deep neural net toolkit for emotion analysis via Facial Expression Recognition (FER).
- PyTorch Implementation of "NVAE: A Deep Hierarchical Variational Autoencoder"
- Label Decoupling Framework for Salient Object Detection
- MONAI - PyTorch-based, open-source framework for deep learning in healthcare imaging, part of PyTorch Ecosystem. (Web)
- Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection
- Faster R-CNN Explained for Object Detection Tasks (2020)
- How to Install OpenCV on a Raspberry Pi (2020)
- Contextual Encoder-Decoder Network for Visual Saliency Prediction
- PyImageSearch - Master Computer Vision, Deep Learning, and OpenCV.
- Natural Adversarial Examples - Harder ImageNet Test Set.
- How to upload 50 OpenCV frames into cloud storage within 1 second (2020)
- Egocentric Videoconferencing (2020) - Method for egocentric videoconferencing that enables handsfree video calls, for instance by people wearing smart glasses or other mixedreality devices. (Video overview)
- gradslam - Open source differentiable dense SLAM library for PyTorch.
- High-Resolution Daytime Translation Without Domain Labels
- Holistically-Nested Edge Detection
- pycls - Image classification codebase, written in PyTorch.
- PyTorch implementation of High-Fidelity Generative Image Compression + Routines for neural image compression
- How Useful is Self-Supervised Pretraining for Visual Tasks?
- PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models
- InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image
- Multi-object trackers in Python - Easy to use implementation of various multi-object tracking algorithms.
- Stanford Vision and Learning Lab (GitHub)
- Learning computer vision. Overview of methods and software (2018)
- Image embeddings. Image similarity and building (2020) (Code)
- All You Need to Know About Object Detection Systems (2020)
- Lightly - Computer vision framework for self-supervised learning.
- DISK: Learning local features with policy gradient (2020) (Code)
- Caer - Lightweight Computer Vision library for high-performance AI research. (Intro)
- Awesome Image to Image Translation Papers
- EfficientDet: Scalable and Efficient Object Detection, in PyTorch
- UNet: semantic segmentation with PyTorch
- Exploring Simple Siamese Representation Learning (2020) (Code) (Code)
- Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
- Nerfies: Deformable Neural Radiance Fields (Code)
- Timeception for Complex Action Recognition (2019) (Code)
- Programming Computer Vision with Python (2014) (Code) (Notes)
- Fast and Accurate One-Stage Space-Time Video Super-Resolution (2020)
- pixelNeRF: Neural Radiance Fields from One or Few Images (2020) (Code)
- vedadet - Single stage object detector toolbox based on PyTorch.
- OneNet: End-to-End One-Stage Object Detection by Classification Cost
- Consistent Video Depth Estimation - Estimate dense, flicker-free, geometrically consistent depth from monocular video, for example hand-held cell phone video.
- Implicit Neural Representations with Periodic Activation Functions
- Computational Imaging Stanford Lab
- Trimap-Free Solution for Portrait Matting in Real Time
- Local Light Field Fusion
- Awesome Crowd Counting
- Neural Sparse Voxel Fields (NSVF)
- One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing (2020) (Tweet)
- SharpAI DeepCamera - Source stack for machine learning engineering with private deployment and AutoML for edge computing. (HN)
- Contrastive learning of global and local features for medical image segmentation with limited annotations
- Real-Time High-Resolution Background Matting (2020) (Code)
- Torchreid - Deep learning person re-identification in PyTorch.
- Unsupervised Embedding Learning via Invariant and Spreading Instance Feature
- img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation
- SSD: Single Shot MultiBox Detector | a PyTorch Tutorial to Object Detection
- PCT: Point Cloud Transformer (2020) (Code)
- Learning Continuous Image Representation with Local Implicit Image Function (2020) (Code)
- Computer Vision Annotation Tool (CVAT)
- DeiT: Data-efficient Image Transformers
- Awesome Implicit Neural Representations
- ImageAI - Python library built to empower developers to build applications and systems with self-contained Computer Vision capabilities. (Web)
- RAIVN Lab - Reasoning, AI and VisioN (RAIVN) Lab. (GitHub)
- Norfair - Customizable lightweight Python library for real-time 2D object tracking.
- Universal Style Transfer in PyTorch
- NVIDIA Deep learning Dataset Synthesizer (NDDS)
- Object Detection at 2530 FPS with TensorRT and 8-Bit Quantization (2020)
- HTML4Vision - Simple HTML visualization tool for computer vision research.
- Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders
- Taming Transformers for High-Resolution Image Synthesis
- X-Temporal - Easily implement SOTA video understanding methods with PyTorch on multiple machines and GPUs.
- NanoDet - Super fast and lightweight anchor-free object detection model. Real-time on mobile devices.
- PyTorch Image Models
- Awesome Vision and Language - Curated list of awesome vision and language resources.
- DropBlock: A regularization method for convolutional networks (2018) (Code)
- Glasses - Compact, concise and customizable deep learning computer vision library. (Web)
- Explorable Super Resolution (2019)
- PySceneDetect - Python and OpenCV-based scene cut/transition detection program & library.
- Best Practices for Building Computer Vision Models (2021)
- TIDE - General Toolbox for Identifying Object Detection Errors.
- Sparse R-CNN: End-to-End Object Detection with Learnable Proposals (2020) (Code)
- Unsplash Image Search - Search photos on Unsplash using natural language.
- Kimera Semantics - Real-Time 3D Semantic Reconstruction from 2D data.
- Voxblox++ - Volumetric object-level semantic mapping framework.
- Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Surfaces (Code)
- Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video (2020) (Code)
- DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation (2019) (Code)
- Awesome Neural Radiance Fields
- D2Det: Towards High Quality Object Detection and Instance Segmentation (2020)
- DetCo: Unsupervised Contrastive Learning for Object Detection (2021) (Code)
- Computer Vision Video Lectures - Curated list of free, high-quality, university-level courses with video lectures related to the field of Computer Vision.
- Cord - Training data toolbox for computer vision. (HN)
- Text-Guided Editing of Images (Using CLIP and StyleGAN)
- torchvision - Datasets, Transforms and Models specific to Computer Vision. (Web)
- MeInGame: Create a Game Character Face from a Single Portrait (2021) (Code)
- Awesome Deep Vision
- dataset-tools - Tools for quickly normalizing image datasets.
- Using Streamlit to visualize object detection output (2021)
- Mobile Computer Vision @ Facebook
- Opening the black box of vision AI algorithms (2021)
- CompreFace - Free face recognition solution that can be easily integrated into any IT system without prior machine learning skills.
- IBRNet: Learning Multi-View Image-Based Rendering (2021) (Code)
- From Coarse to Fine: Robust Hierarchical Localization at Large Scale (2019) (Code)
- Camera Response Function (2021)
- I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image (2020) (Code)
- SkipNet: Learning Dynamic Routing in Convolutional Networks (2018) (Code)
- Mrcal - Camera Calibrations and More. (HN)
- Digging Into Self-Supervised Monocular Depth Estimation (2019) (Code) (Code)
- VISSL - FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images. (Web)
- Zumo Labs - Generate custom synthetic data sets that result in more robust and reliable computer vision models. (GitHub)
- Oriented Object Detection in Aerial Images with Box Boundary-Aware Vectors (2020) (Code)
- Perceiver: General Perception with Iterative Attention (2021) (Code)
- SEER: The start of a more powerful, flexible, and accessible era for computer vision (2021)
- NerFACE: Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction (2021)
- Neural 3D Video Synthesis
- Involution: Inverting the Inherence of Convolution for Visual Recognition (2021) (Code)
- Awesome Causality in Computer Vision
- Vision Transformers for Dense Prediction (2021) (Code)
- LoFTR: Detector-Free Local Feature Matching with Transformers (2021) (Code)
- ccv - C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library.
- Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes (2020) (Code)
- AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control (2021) (Tweet)
- Computer Vision and Embroidery (2021) (Code)
- mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields (2021)
- Python libraries I use every day for computer vision work (2021)
- Awesome Temporal Sentence Grounding in Videos
- The Affective Growth of Computer Vision
- Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D (2020) (Code)
- End-to-End Video Instance Segmentation with Transformers (2021) (Code)
- SAHI: Slicing Aided Hyper Inference
- FOVO: A new 3D rendering technique based on human vision (2020) (HN)
- Is Space-Time Attention All You Need for Video Understanding? (2021) (Code)
- Awesome Visual-Transformer - Transformer with Computer-Vision (CV) papers.
- PyTorchVideo - Deep learning library for video understanding research. (Web)
- Self-supervised Video Object Segmentation by Motion Grouping (2021) (HN) (Code)
- torchvideo - Datasets, transforms and samplers for video in PyTorch.
- A General and Adaptive Robust Loss Function (2019) (Code)
- Self-supervising Fine-grained Region Similarities for Large-scale Image Localization (2020) (Code)
- MaX-DeepLab: Dual-Path Transformers for End-to-End Panoptic Segmentation (2021)
- Vizy - AI Camera.
- MMPX Style-Preserving Pixel Art Magnification (2021) (HN)
- Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion (Code)
- SuperPoint: Self-Supervised Interest Point Detection and Description (2018) (Code)
- Multi-Stage Progressive Image Restoration (2021) (Code)
- COLMAP - General-purpose Structure-from-Motion (SfM) and Multi-View Stereo (MVS) pipeline with a graphical and command-line interface. (Docs)
- Awesome Vision-based SLAM / Visual Odometry
- Barlow Twins: Self-Supervised Learning via Redundancy Reduction (2021) (Code)
- HIPCL - OpenCL/SPIR-V implementation of HIP.
- MMCV - Foundational library for computer vision research and supports many research projects. (Docs)
- MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding (2021) (Code)
- Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples (2021) (Code) (Code)
- Emerging Properties in Self-Supervised Vision Transformers (2021) (Code) (Tweet) (Tweet)
- Geometry-Free View Synthesis: Transformers and no 3D Priors (2021) (Code)
- Easily Transform Portraits of People into AI Aberrations Using StyleCLIP (2021)
- DeepMetaHandles: Learning Deformation Meta-Handles of 3D Meshes with Biharmonic Coordinates (2021) (Code)
- Onepanel - Open and extensible integrated development environment (IDE) for computer vision. (Web)
- Vector Neurons: A General Framework for SO(3)-Equivariant Networks (2021) (Code)
- ISTR: End-to-End Instance Segmentation with Transformers (2021) (Code)
- MLP-Mixer: An all-MLP Architecture for Vision (2021) (Code) (Code)
- Self-attention building blocks for computer vision applications in PyTorch
- LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
- Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary (2021) (Web) (Code)
- Neural Rendering: How Low Can You Go in Terms of Input? (2021)
- Enhancing Photorealism Enhancement (2021) (Paper) (Code)
- DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control (2021) (Code)
- Omnimatte: Associating Objects and Their Effects in Video (2021)
- Rethinking "Batch" in BatchNorm (2021)
- Most popular metrics used to evaluate object detection algorithms
- UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation (2020) (Code)
- Synthetic for Computer Vision - List of synthetic dataset and tools for computer vision.
- vision_blender - Blender addon for generating synthetic ground truth data for Computer Vision applications.
- Easy Few-Shot Learning - Ready-to-use code and tutorial notebooks to boost your way into few-shot image classification.
- BasicSR (Basic Super Restoration) - Open source image and video restoration toolbox based on PyTorch, such as super-resolution, denoise, deblurring, JPEG artifacts removal, etc.
- Intriguing Properties of Vision Transformers (2021) (Reddit)
- DIY Amazon Go – computer vision tutorial for cashierless checkout
- Image Retrieval in the Wild (2020)
- Awesome Transformer in CV papers
- Sensor Calibration from Scratch with Rust (2021)
- Tangram Vision - Integrate, Calibrate Perception Sensors For Robots, Drones & Automation. (Blog)
- Rust CV - Project to implement computer vision algorithms, abstractions, and systems in Rust.
- Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control (2021) (HN)
- Robust Instance Segmentation through Reasoning about Multi-Object Occlusion (2021) (Code)
- MERLOT: Multimodal Neural Script Knowledge Models (2021) (Tweet)
- Scaling Vision Transformers (2021)
- Self-Supervised Scene De-occlusion (2020) (Code)
- Pivotal Tuning for Latent-based Editing of Real Images (2021) (Code)
- FLAME: Articulated Expressive 3D Head Model (Code)
- XCiT: Cross-Covariance Image Transformers (2021) (Code)
- Robust Consistent Video Depth Estimation (2021) (Code)
- cvpods - All-in-one Toolbox for Computer Vision Research.
- CDFI: Compression-Driven Network Design for Frame Interpolation (2021) (Code)
- NeRF--: Neural Radiance Fields Without Known Camera Parameters (2021) (Code) (Code)
- Oxford Active Vision Laboratory (GitHub)
- Computer Vision: Algorithms and Applications, 2nd ed.
- motionEyeOS - Linux distribution that turns your single board computer into a video surveillance system.
- Long-Short Transformer: Efficient Transformers for Language and Vision (2021) (Code)
- Feature Visualization – How NNs understand images (2017)
- What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis (2019) (Code)
- Convolutional Hough Matching Networks (2021) (Code)
- Efficient Self-Supervised Vision Transformers (EsViT)
- ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases (2021) (Code) (Paper Read) (Article)
- CO3D: Common Objects In 3D - Tools for working with the Common Objects in 3D (CO3D) dataset.
- ORBIT: A Real-World Few-Shot Dataset for Teachable Object Recognition (2021) (Code)
- Vision Transformer Architecture Search (2021) (Code)
- TSIT: A Simple and Versatile Framework for Image-to-Image Translation (2020) (Code)
- Recognizing People in Photos Through Private On-Device Machine Learning (2021)
- CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation (2021) (Code)
- HPNet: Deep Primitive Segmentation Using Hybrid Representations (2021) (Code)
- Portal - Fastest way to load and visualize your deep neural networks on images and videos.
- Awesome Human Pose Estimation
- Learning A Single Network for Scale-Arbitrary Super-Resolution (2021) (Code)
- PyTorch implementation for Vision Transformer
- Repulsive Curves - Model 2D & 3D curves while avoiding self-intersection. (Tweet) (Code) (HN)
- SDEdit: Image Synthesis and Editing with Stochastic Differential Equations (Code)
- Region Similarity Representation Learning (2021) (Code)
- NeX: Real-time View Synthesis with Neural Basis Expansion (2021) (Code)
- Convolutional Occupancy Networks (2020) (Code)
- Learning Optical Flow from a Few Matches (2021) (Code)
- Visual Parser: Representing Part-whole Hierarchies with Transformers (2021) (Code)
- Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation (Code)
- On Generating Transferable Targeted Perturbations (2021) (Code)
- Awesome Scene Understanding - List of papers for scene understanding.
- Align before Fuse: Vision and Language Representation Learning with Momentum Distillation (2021) (Code)
- DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks (2021) (Code)
- Object Detection in an Hour (2021) (HN)
- Fixing the train-test resolution discrepancy (2020) (Code)
- Align Deep Features for Oriented Object Detection (2020) (Code)
- Vision-Language Transformer and Query Generation for Referring Segmentation (2021) (Code)
- Depth-supervised NeRF: Fewer Views and Faster Training for Free (2021) (Code)
- SwinIR: Image Restoration Using Swin Transformer (2021) (Code)
- You Only Learn One Representation: Unified Network for Multiple Tasks (2021) (Code)
- Probabilistic Modeling for Human Mesh Recovery (2021) (Code)
- BARF: Bundle-Adjusting Neural Radiance Fields (2021) (Code)
- Self-Calibrating Neural Radiance Fields (2021) (Code)
- Transformers-Tutorials - Demos I made with the Transformers library by HuggingFace.
- 3D Human Texture Estimation from a Single Image with Transformers (2021) (Code)
- CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval (2021) (Code)
- RAFT: Recurrent All Pairs Field Transforms for Optical Flow (2020) (Code)
- Volume rendering + 3D implicit surface = Neural 3D Reconstruction
- Hierarchical Deep Stereo Matching on High-resolution Images (2019) (Code)
- Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering (2021) (Code)
- Image Synthesis via Semantic Composition (2021) (Code)
- Awesome-Edge-Detection-Papers
- Awesome-Image-Colorization
- Learning A Single Network for Scale-Arbitrary Super-Resolution (2021) (Code)
- Face Recognition - 2D and 3D Face alignment library build using PyTorch.
- Awesome image retrieval papers
- PeekingDuck - Modular framework built to simplify Computer Vision inference workloads.
- Pri3D: Can 3D Priors Help 2D Representation Learning? (2021) (Code)
- FaceXLib - Aims at providing ready-to-use face-related functions based on current STOA open-source methods.
- MMAction2 - Open-source toolbox for video understanding based on PyTorch.
- Awesome Collision Detection
- Video Super-Resolution Transformer (2021) (Code)
- NeRF Atlas - Collection of NeRF extensions for fun and experimentation.
- Training and testing codes for USRNet, DnCNN, FFDNet, SRMD, DPSR, MSRResNet, ESRGAN, BSRGAN, SwinIR
- Uformer: A General U-Shaped Transformer for Image Restoration (2021) (Code) (Code)
- Self-Supervised Pretraining Improves Self-Supervised Pretraining (2021) (Code)
- SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes (2021) (Code)
- HRFormer: High-Resolution Transformer for Dense Prediction, NeurIPS 2021
- IceVision - Agnostic Computer Vision Framework - Pluggable to any Training Library: Fastai, Pytorch-Lightning with more to come. (Docs)
- e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks (2021) (Tweet)
- Attention Gated Networks (Image Classification & Segmentation) in PyTorch
- Full-Duplex Strategy for Video Object Segmentation (2021) (Code)
- YoHa - Practical hand tracking engine. (HN) (Code)
- Deep Learning for Face Anti-Spoofing: A Survey (2021) (Code)
- A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation (2021) (Code)
- Resolution-robust Large Mask Inpainting with Fourier Convolutions (2021) (Code)
- Swin Transformer: Hierarchical Vision Transformer using Shifted Windows (2021) (Code) (Code)
- ADOP: Approximate Differentiable One-Pixel Point Rendering (2021) (Tweet) (Tweet) (Code)
- Patches Are All You Need? (2021) (Code)
- ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation (2020) (Code)
- Video Panoptic Segmentation (2020) (Code)
- Awesome-ICCV2021-Low-Level-Vision - Collection of Papers and Codes for ICCV2021 Low Level Vision and Image Generation.
- Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts (2021) (Code)
- Non-deep Networks (2021) (Code)
- receptivefield - Gradient based receptive field estimation for Convolutional Neural Networks.
- Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations (2021) (Code)
- Neural Articulated Radiance Field (2021) (Code)
- Efficient Visual Pretraining with Contrastive Detection (2021) (Code)
- VoTT (Visual Object Tagging Tool) - Source annotation and labeling tool for image and video assets.
- FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes (2021) (Code)
- ByteTrack: Multi-Object Tracking by Associating Every Detection Box (2021) (Code)
- Dense Video Captioning with Bi-modal Transformer (2020) (Code)
- PyTorch-Encoding - CV toolkit for my papers. (Docs)
- Space Time Recurrent Memory Network (2021) (Code)
- CVNets - Library for training computer vision networks.
- Scenic - Jax Library for Computer Vision Research and Beyond. (Paper)
- CV Arxiv Daily (Code)
- OpenVisionCapsules - Set of libraries for encapsulating smart vision algorithms.
- MedMNIST: Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification (Code)
- Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language (2021) (Code)
- Neural-Pull: Learning Signed Distance Functions from Point Clouds by Learning to Pull Space onto Surfaces (2021) (Code)
- The 2021 Image Similarity Dataset and Challenge (2021) (Code)
- K-Net: Towards Unified Image Segmentation (2021) (Code)
- Yolov5 + Deep Sort with PyTorch
- Shape As Points: A Differentiable Poisson Solver (2021) (Code)
- Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm (2021) (Code)
- Awesome Vision-Language Navigation
- An Exploration of Embodied Visual Exploration (2021) (Code)
- DVC: An End-to-end Deep Video Compression Framework (2019) (Code)
- Pixray - Neural image generation.
- Unsupervised Learning of Compositional Energy Concepts (2021) (Tweet)
- Learning with Noisy Labels for Robust Point Cloud Segmentation (2021) (Code)
- Kalidoface - Become a virtual character with just your webcam. (Web)
- KalidoKit - Face, Pose, and Hand Tracking Kinematics.
- The Ancient Secrets of Computer Vision
- Unsupervised Real-world Image Super Resolution via Domain-distance Aware Training (2020) (Code)
- PyGaze - Open source eye-tracking software and more. (HN)
- Exploring Relational Context for Multi-Task Dense Prediction (2021) (Code)
- Neural Scene Graphs for Dynamic Scenes (2021) (Code)
- Image Super-Resolution via Iterative Refinement (HN) (Code)
- UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning (2021) (Code)
- Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers (2021) (Code)
- Multimodal Virtual Point 3D Detection (2021) (Code)
- SiT: Self-supervised vIsion Transformer
- Attention Mechanisms in Computer Vision: A Survey (2021)
- Awesome Vision Attention Papers
- FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation (2021) (Code)
- RenderNet: A deep convolutional network for differentiable rendering from 3D shapes (2018) (Code)
- Masked Autoencoders Are Scalable Vision Learners (2021) (Code) (Code) (Code)
- BoostingMonocularDepth
- It's About Time: Analog Clock Reading in the Wild (2021) (Tweet) (Code)
- Learning to Compose Visual Relations (2021) (Code)
- LF-Net: Learning Local Features from Images (2018) (Code)
- Aligning Pretraining for Detection via Object-Level Contrastive Learning (2021) (Code)
- Look at the Variance! Efficient Black-box Explanations with Sobol-based Sensitivity Analysis (2021) (Code)
- Deep unfolding network for image super-resolution (2020)
- VOLO: Vision Outlooker for Visual Recognition (2021) (Code)
- Direct Multi-view Multi-person 3D Pose Estimation (2021) (Code)
- Image2Mesh: A learning framework for single image 3D reconstruction (2019) (Code)
- GammaCV - WebGL accelerated Computer Vision library for modern web applications. (Web)
- Localizing Objects with Self-Supervised Transformers and no Labels (2021) (Code)
- Harvester - GenICam-based Image Acquisition Python Library.
- NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion (2021) (Code) (PyTorch Code)
- ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision (2021) (Code)
- MetaFormer is Actually What You Need for Vision (2021) (Code)
- ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators (2021) (Code)
- Mesa: A Memory-saving Training Framework for Transformers (2021) (Code)
- MMPose - Open-source toolbox for pose estimation based on PyTorch. (Docs)
- An Empirical Study of Training End-to-End Vision-and-Language Transformers (2021) (Code)
- Useful computer vision PhD resources
- Tenyks - Data-centric Computer Vision.
- Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation (2021) (Code)
- GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields (2021) (Code)
- Learning to See by Looking at Noise (2021) (Code)
- iBOT: Image BERT Pre-Training with Online Tokenizer (2021) (Code)
- Grounded Language-Image Pre-training (2021) (Code)
- 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction (2016) (Code)
- Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks (Code)
- Awesome Visual Grounding
- Are Transformers More Robust Than CNNs? (2021) (Code)
- Plenoxels: Radiance Fields without Neural Networks (2021) (Code) (Code)
- GFPGAN - Developing Practical Algorithms for Real-world Face Restoration.
- Awesome Video Stabilization
- MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo (2021) (Code)
- Tracking People with 3D Representations (2021) (Code)
- Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection (2019:) (Code)
- Learning to Stylize Novel Views (2021) (Code)
- YOLOX - High-performance anchor-free YOLO. (Docs)
- PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop (2021) (Code)
- SeqFormer: a Frustratingly Simple Model for Video Instance Segmentation (2021) (Code)
- NeRD: Neural Reflectance Decomposition from Image Collections (2021) (Code)
- Vector Quantized Diffusion Model for Text-to-Image Synthesis (2021) (Code) (Code) (Code)
- GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models (2021) (Code)
- SynthDet - End-to-end object detection pipeline using synthetic data.
- MPViT: Multi-Path Vision Transformer for Dense Prediction (2021) (Code)
- StyleSwin: Transformer-based GAN for High-resolution Image Generation (2021) (Code)
- Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline (2021) (Code)
- SLIP: Self-supervision meets Language-Image Pre-training (2021) (Code)
- General Facial Representation Learning in a Visual-Linguistic Manner (2021) (Code) (Code)
- HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields (Code) (HN)
- Learning to Regress Bodies from Images using Differentiable Semantic Rendering (2021) (Code)
- High-Resolution Image Synthesis with Latent Diffusion Models (2021) (Code)
- Photorealistic Audio-driven Video Portraits (2020) (Code)
- Awesome Hand Pose Estimation
- Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers (2021) (Code)
- Transformer Interpretability Beyond Attention Visualization (2021) (Code)
- StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Synthesis (2021) (Code)
- Light Field Image Super-Resolution with Transformers (2021) (Code)
- Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes (2021) (Code)
- DeepSIM: Image Shape Manipulation from a Single Augmented Training Sample (2021) (Code)
- RAFT-3D: Scene Flow using Rigid-Motion Embeddings (2021) (Code)
- Unsupervised Indoor Depth Estimation (2020) (Code)
- A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose (2021) (Code)
- Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective (2021) (Code)
- Sara - Easy-to-Use C++ Computer Vision Library.
- RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching (2021) (Code)
- U-2-Net: Going Deeper with Nested U-Structure for Salient Object Detection (2020) (Code)
- Language as Queries for Referring Video Object Segmentation (2022) (Code)
- Localization with Sampling-Argmax (2021) (Code)
- VOCA: Voice Operated Character Animation (Code)
- CVZone - Computer vision package that makes its easy to run Image processing and AI functions.
- Deepface - Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python.
- Location-aware Single Image Reflection Removal (2021) (Code)
- MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement (2021) (Code)
- Detecting Twenty-thousand Classes using Image-level Supervision (2022) (Code)
- Language-driven Semantic Segmentation (2022) (Code)
- Rethinking Nearest Neighbors for Visual Classification (2021) (Code)
- Vision Transformer with Deformable Attention (2022) (Code) (Code)
- KerasCV - Industry-strength Computer Vision workflows with Keras.
- Instant Neural Graphics Primitives - Lightning fast NeRF and more.
- Dynamic Head: Unifying Object Detection Heads with Attentions (2021) (Code)
- ELSA: Enhanced Local Self-Attention for Vision Transformer (2021) (Code)
- FFCV - Fast Forward Computer Vision (and other ML workloads!) (Web)
- Awesome Vit - Curated list and survey of awesome Vision Transformers.
- Instant Neural Graphics Primitives with a Multiresolution Hash Encoding (2022) (Code) (Code) (Video Summary) (HN)
- Road Extraction by Deep Residual U-Net (2017) (Code)
- Single-Stage 6D Object Pose Estimation (2019) (Code)
- Visual Task Adaptation Benchmark (VTAB)
- TAda! Temporally-Adaptive Convolutions for Video Understanding (2022) (Code)
- UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction (2021) (Code)
- Co-Fusion: Real-time Segmentation, Tracking and Fusion of Multiple Objects (2020) (Code)
- VRT: A Video Restoration Transformer (2021) (Code)
- Unknown Object Segmentation from Stereo Images (2021) (Code)
- Stacked Cross Attention for Image-Text Matching (2018) (Code)
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation (2022) (Code)
- DeFlow: Learning Complex Image Degradations from Unpaired Data with Conditional Flows (2021) (Code)
- DocFormer: End-to-End Transformer for Document Understanding (2022) (Code)
- SeMask: Semantically Masked Transformers for Semantic Segmentation (2021) (Code)
- Image Quality Assessment: Unifying Structure and Texture Similarity (2020) (Code)
- Learning Super-Features for Image Retrieval (2022)
- YOLOv7 - Framework Beyond Detection.
- A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model (2021) (Code)
- Single/Multiple Object Tracking and Segmentation
- Learnable Multi-level Frequency Decomposition and Hierarchical Attention Mechanism for Generalized Face Presentation Attack Detection (2021) (Code)
- HifiFace: 3D Shape and Semantic Prior Guided High Fidelity Face Swapping (2021) (Code)
- Scalable Large Scene Neural View Synthesis (2022) (HN)
- Transformer Recipe - Quick recipe to learn all about Transformers.
- NeROIC: Neural Rendering of Objects from Online Image Collections (2022) (Code)
- DiffusionNet: Discretization Agnostic Learning on Surfaces (2022) (Code)
- FILM: Frame Interpolation for Large Motion (2022) (Code)
- Learning Signed Distance Field for Multi-view Surface Reconstruction (2021) (Code)
- Deep Metric Learning in PyTorch
- ICON: Implicit Clothed humans Obtained from Normals (2021) (Code)
- CLIPasso: Semantically-Aware Object Sketching (2022) (Code)
- BANMo: Building Animatable 3D Neural Models from Many Casual Videos (2022) (Code)
- How Do Vision Transformers Work?
- Top 10 Computer Vision Papers of 2021
- Exploring Sparsity in Image Super-Resolution for Efficient Inference (2021) (Code)
- AutoInt: Automatic Integration for Fast Neural Volume Rendering (2021)
- Learning to Prompt for Vision-Language Models (2021) (Code)
- Summarizing Videos with Attention (2019) (Code)
- vkit - Toolkit designed for CV (Computer Vision) developers. (Docs)
- Generative Adversarial Graph Convolutional Networks for Human Action Synthesis (2021) (Code)
- Awesome Image Matting
- Image-to-Markup Generation with Coarse-to-Fine Attention (Code)
- Push-ups with Python, mediapipe and OpenCV (HN)
- Lama-cleaner: Image inpainting tool powered by LaMa
- Vision-Language Pre-Training with Triple Contrastive Learning (2022) (Code)
- 3D Machine Learning resources/papers
- FiftyOne - Open-source tool for building high-quality datasets and computer vision models.
- Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut (2022) (Code)
- Awesome Multiple object Tracking
- Rethinking Coarse-to-Fine Approach in Single Image Deblurring (2021) (Code)
- Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling (2021) (Code)
- As-ViT: Auto-scaling Vision Transformers without Training (2022) (Code)
- Awesome 3D Body Papers
- RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth (2021) (Code)
- Image Similarity Challenge
- Blended Diffusion for Text-driven Editing of Natural Images (2021) (Code)
- The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization (2021) (Code)
- Awesome Object Pose
- Video Enhancement papers/resources
- PowerQE: An Open Framework for Quality Enhancement of Compressed Visual Data
- Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels (2022) (Code)
- Accurate Image Alignment and Registration Using OpenCV (2022) (HN)
- Video Grounding and Captioning
- Awesome Detection Transformer
- StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis (2021) (Code) (Web) (HN)
- Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition (2020) (Code)
- MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation (2021) (Code)
- DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection (2022) (Code)
- Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation (2022)
- CycleMLP: A MLP-like Architecture for Dense Prediction (2022) (Code)
- Image Quality Assessment Benchmark
- StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation (2021) (Code)
- Transformers, originally designed to handle language, are taking on vision (2022) (HN)
- Fast Image Processing with Fully-Convolutional Networks (2017) (Code)
- Efficient Attention: Attention with Linear Complexities (2020) (Code)
- Label-Efficient Semantic Segmentation with Diffusion Models (2022) (Code)
- hloc - Modular toolbox for state-of-the-art 6-DoF visual localization.
- All Tokens Matter: Token Labeling for Training Better Vision Transformers (2021) (Code)
- Deformable ConvNets v2: More Deformable, Better Results (2018) (Code)
- Restormer: Efficient Transformer for High-Resolution Image Restoration (2021) (Code)
- Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice (2022) (Code)
- NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video (2021) (Code)
- Awesome 3D Human Reconstruction
- Awesome 3D Human Resources List
- A ConvNet for the 2020s (2022) (Code) (Code)
- Remote-sensing-image-semantic-segmentation - Uses Unet-based improved networks to study Remote sensing image semantic segmentation, which is based on keras.
- Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies (2021) (Code)
- TensoRF: Tensorial Radiance Fields (2022) (Code)
- Autoregressive Image Generation using Residual Quantization (2022) (Code) (Code)
- Pix2Pix Timbre Transfer
- One-Shot Adaptation of GAN in Just One CLIP (2022) (Code)
- PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds (2021) (Code)
- VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training (2022) (Code)
- Awesome Masked Image Modeling
- BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training (2022) (Code)
- A Transformer-Based Siamese Network for Change Detection (2022) (Code)
- Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition (2021) (Code)
- Robust fine-tuning of zero-shot models (2022) (Code)
- DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision (2021) (Code)
- GroupViT: Semantic Segmentation Emerges from Text Supervision (2022) (Code)
- HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening (2022) (Code)
- TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing (2022) (Code)
- DeepStream-Yolo - NVIDIA DeepStream SDK 6.0.1 configuration for YOLO models.
- An Empirical Investigation of 3D Anomaly Detection and Segmentation (2022) (Code)
- Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online Adaptation (2021) (Code)
- Layered Neural Atlases for Consistent Video Editing (2021) (Code)
- TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution (2020)
- Shape from Polarization for Complex Scenes in the Wild (2022) (Code)
- Pix2Seq - General framework for turning RGB pixels into semantically meaningful sequences.
- Gait Recognition in the Wild with Dense 3D Representations and A Benchmark (2022) (Code)
- Ensembling Hugging Face Transformers made easy
- Relational Knowledge Distillation (2019) (Code)
- NICE-SLAM: Neural Implicit Scalable Encoding for SLAM (2021) (Code)
- Neural 3D Mesh Renderer (2017) (Code)
- Large-scale Bilingual Language-Image Contrastive Learning (2022) (Code)
- OpenMVG - Open Multiple View Geometry library. Basis for 3D computer vision and Structure from Motion.
- Neural Points: Point Cloud Representation with Neural Fields (2021) (Code)
- OpenCV JS Web Worker - Getting started with OpenCV compiled to Webassembly and loaded in a worker.
- Learning Graph Regularisation for Guided Super-Resolution (2022) (Code)
- Video Polyp Segmentation: A Deep Learning Perspective (2022) (Code)
- Adjacent Context Coordination Network for Salient Object Detection in Optical Remote Sensing Images (2022) (Code)
- HybridNets: End-to-End Perception Network (2022) (Code)
- HDR-NeRF: High Dynamic Range Neural Radiance Fields (2022) (Code)
- AdaMixer: A Fast-Converging Query-Based Object Detector (2022) (Code)
- MixFormer: End-to-End Tracking with Iterative Mixed Attention (2022) (Code)
- Bringing Old Films Back to Life (2022) (Code)
- Extracting Triangular 3D Models, Materials, and Lighting From Images (2022) (Code)
- LiT: Zero-Shot Transfer with Locked-image text Tuning (2021) (Tweet)
- LAFITE: Towards Language-Free Training for Text-to-Image Generation (2021) (Code)
- Neural 3D Video Synthesis from Multi-view Video (2022) (Code)
- ToFu: Topologically Consistent Multi-View Face Inference Using Volumetric Sampling (2021)
- Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning (2019) (Code)
- FrankMocap: A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator (2021)
- Reddit Place Script 2022 - Script to draw an image onto r/place.
- A Unified Objective for Novel Class Discovery (2021) (Code)
- Papers and Datasets about Point Cloud
- On the Importance of Asymmetry for Siamese Representation Learning (2022) (Code)
- REGTR: End-to-end Point Cloud Correspondences with Transformers
- A Closer Look at Local Aggregation Operators in Point Cloud Analysis (2020) (Code)
- Online Continual Learning on a Contaminated Data Stream with Blurry Task Boundaries (2022) (Code)
- Perception Prioritized Training of Diffusion Models (2022) (Code)
- VisualBERT: A Simple and Performant Baseline for Vision and Language (2019) (Code)
- MultiMAE: Multi-modal Multi-task Masked Autoencoders (2022) (Code)
- NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction (2021) (Code)
- Towards Open World Object Detection (2021) (Code)
- Transformer in Vision - Recent Transformer-based CV and related works.
- Shunted Self-Attention via Multi-Scale Token Aggregation (2021) (Code)
- Space-Time Correspondence as a Contrastive Random Walk (2020) (Code)
- MaskGIT: Masked Generative Image Transformer (2022) (Code)
- EasyCV - All-in-one computer vision toolbox based on PyTorch.
- Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection (2022) (Code)
- EMOCA: Emotion Driven Monocular Face Capture and Animation (2022)
- Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation (2022) (Code)
- FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset (Code)
- PointCLIP: Point Cloud Understanding by CLIP (2022) (Code)
- DaViT: Dual Attention Vision Transformers (2022) (Code)
- DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers (2022) (Code)
- Recovering 3D Human Mesh from Monocular Images: A Survey (2022) (Code)
- Video Diffusion Models (2022) (Web) (Code)
- MaxViT: Multi-Axis Vision Transformer (2022) (Code)
- Unified Contrastive Learning in Image-Text-Label Space (2022) (Code)
- RePOSE: Fast 6D Object Pose Refinement via Deep Texture Rendering (2021) (Code)
- MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition (2021) (Code)
- Learning What Not to Segment: A New Perspective on Few-Shot Segmentation (2022) (Code)
- MAXIM: Multi-Axis MLP for Image Processing (2022) (Code)
- Tensil tutorial for YOLO v4 Tiny on Ultra96 V2 (2022)
- UNITER: UNiversal Image-TExt Representation Learning (2020) (Code)
- Consistent Depth of Moving Objects in Video (2021) (Code)
- Bridging Video-text Retrieval with Multiple Choice Questions (2022) (Code)
- Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation (2020) (Code)
- BACON: Band-limited Coordinate Networks for Multiscale Scene Representation (2022) (Code)
- Solving ImageNet: a Unified Scheme for Training any Backbone to Top Results (2022) (Code)
- Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering (2021) (Code)
- SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image (2022) (Code)
- StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions (2021) (Code)
- Neighborhood Attention Transformer (2022) (Code)
- 3D Surface Reconstruction From Multi-Date Satellite Images (2021) (Code)
- Decoupling Makes Weakly Supervised Local Feature Better (2022) (Code)
- ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic (2022) (Code)
- EasyMocap - Open-source toolbox for markerless human motion capture from RGB videos.
- QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation (2022) (Code)
- PolarMask: Single Shot Instance Segmentation with Polar Representation (2019) (Code)
- Latent Video Transformer (2020) (Code)
- NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (2020) (JAX Code)
- A Latent Transformer for Disentangled Face Editing in Images and Videos (2021) (Code)
- Photorealistic Style Transfer via Wavelet Transforms (2019) (Code)
- Probing ViTs
- Dense Depth Priors for Neural Radiance Fields from Sparse Input Views (2021) (Code)
- Self-Supervised Models are Continual Learners (2021) (Code)
- Mask Transfiner for High-Quality Instance Segmentation (2022) (Code)
- An Extendable, Efficient and Effective Transformer-based Object Detector (2022)
- Learned Queries for Efficient Local Attention (2021) (Code)
- 3D Human Pose Estimation with Spatial and Temporal Transformers (2021) (Code)
- 3D human pose estimation in video with temporal convolutions and semi-supervised training (2019) (Code)
- MC-Calib: A generic and robust calibration toolbox for multi-camera systems (2022) (Code)
- Understanding The Robustness in Vision Transformers (2022) (Code)
- Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation (2021) (Code)
- Tackling multiple tasks with a single visual language model (2022) (Code) (Tweet)
- Associating Objects with Transformers for Video Object Segmentation (2021) (Code)
- Simple multi-dataset detection - Object detection on multiple datasets with an automatically learned unified label space.
- Learning Texture Transformer Network for Image Super-Resolution (2020) (Code)
- Balanced MSE for Imbalanced Visual Regression (2022) (Code)
- Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions (2022) (Code)
- Action-Conditioned 3D Human Motion Synthesis with Transformer VAE (2021) (Code)
- CoMoGAN: continuous model-guided image-to-image translation (2021) (Code)
- OpenMVS - Open Multi-View Stereo reconstruction library.
- Sliced Recursive Transformer (2021) (Code)
- Neural Dual Contouring (2022) (Code)
- Awesome Deblurring - Curated list of resources for Image and Video Deblurring.
- CoCa: Contrastive Captioners are Image-Text Foundation Models (2022) (Code)
- Sequencer: Deep LSTM for Image Classification (2022)
- Language Models Can See: Plugging Visual Controls in Text Generation (2022) (Code)
- flyswot - CLI for Hugging Face Transformers image classification models.
- Neural 3D Scene Reconstruction with the Manhattan-world Assumption (2022) (Code)
- PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (2022) (Code)
- What do the Vision Transformers learn? How do they encode anything useful for image recognition? (2022)
- Integrative Few-Shot Learning for Classification and Segmentation (2022) (Code)
- DeltaConv: Anisotropic Geometric Deep Learning with Exterior Calculus (2022) (Code)
- pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis (2021) (Code)
- Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework (2022) (Code)
- ConvMAE: Masked Convolution Meets Masked Autoencoders (2022) (Code)
- Deep Kernelized Dense Geometric Matching (2022) (Code)
- Unsupervised Semantic Segmentation by Distilling Feature Correspondences (2022) (Code)
- RecursiveMix: Mixed Learning with History (2022) (Code)
- MMDetection3d - OpenMMLab's next-generation platform for general 3D object detection.
- Imagen: Text-to-Image Diffusion Models (Tweet) (Code) (HN) (HN)
- An End-to-End Transformer Model for 3D Object Detection (2021) (Code)
- Neural 3D Reconstruction in the Wild (2022) (Code)
- Body shape and pose estimation on 3D scans of people in clothing using Ceres Solver
- A Survey of Visual Transformers (2021) (Code)
- Nerfies: Deformable Neural Radiance Fields (2021) (Code)
- Working notes on the role of vision papers in basic science (2022) (Tweet)
- CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers (2022) (HN)
- Prompt-aligned Gradient for Prompt Tuning (2022) (Code)
- Text2Human: Text-Driven Controllable Human Image Generation (2022) (Code)
- OnePose: One-Shot Object Pose Estimation without CAD Models (2022) (Code)
- PREF: Phasorial Embedding Fields for Compact Neural Representations (2022) (Code)
- Optimizing Relevance Maps of Vision Transformers Improves Robustness (2022) (Code)
- Exploring Visual Prompts for Adapting Large-Scale Models (2022) (Code)
- Deepfake Offensive Toolkit - Makes real-time, controllable deepfakes ready for virtual cameras injection. (HN)
- Real-time Object Detection for Streaming Perception (2022) (Code)
- Volumentations 3D - Library for 3D augmentations.
- Awesome Learning with Label Noise
- LIVE: Towards Layer-wise Image Vectorization (2022) (Code)
- BEVT: BERT Pretraining of Video Transformers (2021) (Code)
- Variable Bitrate Neural Fields (2022) (Code)
- Gated-SCNN: Gated Shape CNNs for Semantic Segmentation (2019) (Code)
- Masked Unsupervised Self-training for Zero-shot Image Classification (2022) (Code)
- HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video (2022) (Code)
- Awesome Implicit NeRF Robotics
- EfficientFormer: Vision Transformers at MobileNet Speed (2022) (Code)
- ARF: Artistic Radiance Fields (2022) (Code) (HN)
- Patch2Pix: Epipolar-Guided Pixel-Level Correspondences (2020) (Code)
- Translating Images into Maps (2022) (Code)
- Instances as Queries (2021) (Code)
- OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic 3D Reconstruction (2022) (Code)
- CogView: Mastering Text-to-Image Generation via Transformers (2021) (Code)
- All in One: Exploring Unified Video-Language Pre-training (2022) (Code)
- Towards Exemplar-Free Continual Learning in Vision Transformers: an Account of Attention, Functional and Weight Regularization (2022) (Code)
- Solving Inefficiency of Self-supervised Representation Learning (2021) (Code)
- NeRFactor: Neural Factorization of Shape and Reflectance Under an Unknown Illumination (2021) (Code)
- Trending in 3D Vision
- ShapeFormer: Transformer-based Shape Completion via Sparse Representation (2022) (Code)
- Awesome Prompting Papers in Computer Vision
- EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation (2022) (Code)
- GenDR: A Generalized Differentiable Renderer (2022) (Code)
- Elucidating the Design Space of Diffusion-Based Generative Models (2022) (Code)
- IRON: Inverse Rendering by Optimizing Neural SDFs and Materials from Photometric Images (2022) (Code)
- Omnivore: A Single Model for Many Visual Modalities (2022) (Code)
- Benchmarking and Analyzing Point Cloud Classification under Corruptions (2022) (Code)
- DVGO: Direct Voxel Grid Optimization (Super-fast Convergence for Radiance Fields Reconstruction) (2022) (Code)
- RegionCLIP: Region-based Language-Image Pretraining (2021) (Code)
- Fast Light-Weight Near-Field Photometric Stereo (2022) (Code)
- ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (2021) (Code)
- RePaint: Inpainting using Denoising Diffusion Probabilistic Models
- The Probabilistic Normal Epipolar Constraint for Frame-To-Frame Rotation Optimization under Uncertain Feature Positions (2022) (Code)
- 3D Moments from Near-Duplicate Photos (2022) (Code)
- Prototypical Contrastive Language Image Pretraining (2022) (Code)
- NeRV: Neural Representations for Videos (2021) (Code)
- MT-YOLOv6 - Single-stage object detection framework dedicated to industrial applications.
- Fast Point Transformer (2022) (Code)
- FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation (2022) (Code)
- Nettle Magic Project - Scanner for decks of cards with bar codes printed on card edges. (HN)
- Image Quality Assessment using Contrastive Learning (2021) (Code)
- Denoised MDPs: Learning World Models Better Than The World Itself (2022) (Code)
- Sparse Instance Activation for Real-Time Instance Segmentation (2022) (Code)
- Referring Image Matting (2022) (Code)
- Voxel-MAE: Masked Autoencoders for Pre-training Large-scale Point Clouds (2022) (Code)
- Contrastive Boundary Learning for Point Cloud Segmentation (2022) (Code)
- Scaling up Kernels in 3D CNNs (2022) (Code)
- Oriented RepPoints for Aerial Object Detection (2022) (Code)
- Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly (2022) (Code)
- EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications (2022) (Code)
- Awesome Visual Diffusion Models
- Vision Transformer Adapter for Dense Predictions (2022) (Code)
- Activating More Pixels in Image Super-Resolution Transformer (2022) (Code)
- PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies (2022) (Code)
- GMFlow: Learning Optical Flow via Global Matching (2022) (Code)
- Vector-quantized Image Modeling with Improved VQGAN (2021) (JAX Code)
- Learned Vertex Descent: A New Direction for 3D Human Model Fitting (2022) (Code)
- YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors (2022) (Code)
- AITViewer - Set of tools to visualize and interact with sequences of 3D data.
- Object-Compositional Neural Implicit Surfaces (Code)
- Awesome Egocentric Vision
- MonoScene: Monocular 3D Semantic Scene Completion (2022) (Code)
- Visual Prompt Tuning (2022) (Code)
- Unified Implicit Neural Stylization (2022) (Code)
- 3D-Aware Semantic-Guided Generative Model for Human Synthesis (2021) (Code)
- Text2LIVE: Text-Driven Layered Image and Video Editing (2022) (HN)
- HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction (2022) (Code)
- Generalization of Otsu's Method and Minimum Error Thresholding (2020)
- XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model (2022) (Code)
- Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation (2021) (Code)
- Deformable Sprites for Unsupervised Video Decomposition (2022) (Code)
- Topologically-Aware Deformation Fields for Single-View 3D Reconstruction (2022) (Code)
- Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation (2021) (Code)
- Refign: Align and Refine for Adaptation of Semantic Segmentation to Adverse Conditions (2022) (Code)
- Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry (2021) (Code)
- Box-supervised Instance Segmentation with Level Set Evolution (2022)
- Tent: Fully Test-Time Adaptation by Entropy Minimization (2021) (Code)
- UniFormer: Unifying Convolution and Self-attention for Visual Recognition (2022) (Code)
- MOTR: End-to-End Multiple-Object Tracking with Transformer (2022) (Code)
- Towards Grand Unification of Object Tracking (2022) (Code)
- Benchmarking Omni-Vision Representation through the Lens of Visual Realms (2022) (Code)
- Color Histograms in Image Retrieval
- SeqTR: A Simple yet Universal Network for Visual Grounding (2022) (Code)
- Image Inpainting with External-internal Learning and Monochromic Bottleneck (2021) (Code)
- Deep Image Homography Estimation (2016) (Code)
- Illumination Adaptive Transformer (2022) (Code)
- MotionCLIP: Exposing Human Motion Generation to CLIP Space (2022) (Code)
- Awesome Image Composition
- Scene Text Recognition with Permuted Autoregressive Sequence Models (2022) (Code)
- Multimodal Masked Autoencoders Learn Transferable Representations (Code)
- BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection (2022) (Code)
- BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving (2022) (Code)
- AdaNeRF: Adaptive Sampling for Real-time Rendering of Neural Radiance Fields (2022) (Code)
- Harmonizer: Learning to Perform White-Box Image and Video Harmonization (2022) (Code)
- CVAT - Computer Vision Annotation Tool. (Code)
- NeuMesh: Learning Disentangled Neural Mesh-based Implicit Field for Geometry and Texture Editing (2022)
- Monocular 3D Object Detection with Depth from Motion (2022) (Code)
- Masked Discrimination for Self-Supervised Learning on Point Clouds (2022) (Code)
- SORT - Simple, online, and real time tracking of multiple objects in a video sequence.
- Local Color Distributions Prior for Image Enhancement (2022) (Code)
- S2Contact: Graph-based Network for 3D Hand-Object Contact Estimation with Semi-Supervised Learning (2022) (Code)
- Is Attention All NeRF Needs? (2022) (Code)
- Camouflaged/Concealed Object Detection
- Accelerate Vision Transformer (ViT) with Quantization using Optimum (2022)
- Optimizing Transformers for GPUs with Optimum (2022)
- Photogrammetry Guide (HN)
- Multi-View Mesh Reconstruction with Neural Deferred Shading (2022) (Code)
- Initialization and Alignment for Adversarial Texture Optimization (2022) (Code)
- DCT-Net: Domain-Calibrated Translation for Portrait Stylization (2022) (Code)
- Pretraining is All You Need for Image-to-Image Translation (2022) (Code)
- Vision-Centric BEV Perception: A Survey
- Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency (2022) (Code)
- Awesome Weakly Supervised Semantic Segmentation Papers
- GAUDI: A Neural Architect for Immersive 3D Scene Generation (2022) (Code) (HN)
- Multimodal Image Synthesis and Editing: A Survey (2021) (Code)
- High-Resolution Image Synthesis with Latent Diffusion Models (2022) (Code)
- ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically Simulated Characters (2022) (Code)
- 3D Vision with Transformers: A Survey (2022)
- Optical Flow Processing Stack
- VideoX - Multi-modal Video Content Understanding
- Simple Baselines for Image Restoration (2022) (Code)
- Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning (2022)
- Revisiting the Critical Factors of Augmentation-Invariant Representation Learning (2022) (Code)
- Image Quality Related Papers
- Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution (2022) (Code)
- MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation (2022) (Code)