Computer vision

OpenCV - Open Source Computer Vision Library. (Web) (OpenCV Course)

Gluon CV Toolkit - Provides implementations of the sate-of-the-art (SOTA) deep learning models in computer vision.

Pythia - Modular framework for vision and language multimodal research. Built on top of PyTorch.

video-object-removal - Just draw a bounding box and you can remove the object you want to remove.

GoCV - Go package for computer vision using OpenCV 4 and beyond.

Sandbox for training convolutional networks for computer vision

Get started with Computer Vision, Deep Learning, and OpenCV

TorchCV - PyTorch-Based Framework for Deep Learning in Computer Vision.

AI Habitat - Flexible, high-performance 3D simulator for Embodied AI research.

Kornia - Open Source Differentiable Computer Vision Library for PyTorch. (Web)

Roboflow - Raw images to trained computer vision model. (Article)

PySlowFast - Open source video understanding codebase from FAIR that provides state-of-the-art video classification models.

How to Convert a Picture to Numbers

Awesome Computer Vision

The Ancient Secrets of Computer Vision (2018)

Variational Methods for Computer Vision lectures (2013)

Classy Vision - New end-to-end, PyTorch-based framework for large-scale training of state-of-the-art image and video classification models.

Meshroom - 3D Reconstruction Software.

AliceVision - Photogrammetric Computer Vision Framework. (Code) (GitHub)

PyTorch3d - Provides efficient, reusable components for 3D Computer Vision research with PyTorch. (Web)

Face Recognition - World's simplest facial recognition api for Python and the command line.

Deep Hough Voting for 3D Object Detection in Point Clouds

Point Cloud Library - Standalone, large scale, open project for 2D/3D image and point cloud processing.

Disappearing-People - Removing people from complex backgrounds in real time using TensorFlow.js in the web browser. (HN)

Best Practices, code samples, and documentation for Computer Vision

Computer Vision Basics in Microsoft Excel

PolyGen: An Autoregressive Generative Model of 3D Meshes (2020)

Sophus - C++ implementation of Lie Groups using Eigen.

SOLT - Streaming over lightweight data transformations.

Awesome Interaction-aware Behavior and Trajectory Prediction

SynSin: End-to-end View Synthesis from a Single Image (2020) (Code)

Pixel2Mesh - Generating 3D Mesh Models from Single RGB Images.

First Order Motion Model for Image Animation (Code)

PyTorch improved version of TPAMI 2017 paper: Face Alignment in Full Pose Range: A 3D Total Solution

Learning to See Through Obstructions

Learning to Cluster Faces on an Affinity Graph (LTC)

Avatarify - Avatars for Zoom and Skype.

SPSR - PyTorch implementation of Structure-Preserving Super Resolution with Gradient Guidance.

OISR-PyTorch - PyTorch implementation of "ODE-inspired Network Design for Single Image Super-Resolution.

3D Photography using Context-aware Layered Depth Inpainting

CenterMask : Real-Time Anchor-Free Instance Segmentation

Interview with Dmytro Mushkin | Computer Vision Research | Kaggle, ML & Education (2020)

Pytorch code for ICLR-20 Paper "Learning to Explore using Active Neural SLAM"

FaceTracker - Real time deformable face tracking in C++ with OpenCV 3.

Awesome Super Resolution

Adversarial Latent Autoencoders

ElasticFusion - Real-time dense visual SLAM system capable of capturing comprehensive dense globally consistent surfel-based maps of room scale environments explored using an RGB-D camera.

StegaStamp: Invisible Hyperlinks in Physical Photographs

Pose Animator - Takes a 2D vector illustration and animates its containing curves in real-time based on the recognition result from PoseNet and FaceMesh. (HN)

fvcore - Collection of common code that's shared among different research projects in FAIR computer vision team.

Making Sense of Vision and Touch: Multimodal Representations for Contact-Rich Tasks (2020)

ScreenPoint - Project an image centroid to another image using OpenCV.

U^2-Net - Code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection".

TorchIO - Tools for medical image processing in deep learning.

Real time Image Animation in OpenCV using first order model (HN)

OpenMV (Open-Source Machine Vision) - Aims at making machine vision more accessible to beginners by developing a user-friendly, open-source, low-cost machine vision platform.

TSD - 1st place models in Google OpenImage Detection Challenge 2019.

Training-Time-Friendly Network for Real-Time Object Detection

Big Transfer (BiT): General Visual Representation Learning

Fast Human Pose Estimation CVPR2019

Deep High-Resolution Representation Learning for Human Pose Estimation

Background Matting: The World is Your Green Screen

DE⫶TR: End-to-End Object Detection with Transformers

PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization

Tracking Objects as Points

VIBE - Video Inference for Human Body Pose and Shape Estimation.

SRZoo - Integrated repository for super-resolution using deep learning.

mAP (mean Average Precision) - Evaluates the performance of your neural net for object recognition.

Neural Pose Transfer by Spatially Adaptive Instance Normalization (2020)

Awesome Neural Rendering

Learning To Classify Images Without Labels

Deep Leakage From Gradients (2019)

3Dflow - Offers customized computer vision software solutions.

labelme - Image Polygonal Annotation with Python.

imgviz - Image Visualization Tools.

Attention-Guided Hierarchical Structure Aggregation for Image Matting

YOLOv5 Is Here: State-of-the-Art Object Detection at 140 FPS (2020) (HN) (Code)

DetectoRS - Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution.

PyTorch implementation of paper Real-time Facial Surface Geometry from Monocular Video on Mobile GPUs

VirTex: Learning Visual Representations from Textual Annotations

High-Resolution 3D Human Digitization from A Single Image

FairMOT - Simple baseline for one-shot multi-object tracking.

Implicit Neural Representations with Periodic Activation Functions (2020)

MSeg: A Composite Dataset for Multi-Domain Segmentation

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

MMDetection - OpenMMLab Detection Toolbox and Benchmark.

Fourier Feature Networks in TensorFlow 2

Computer Vision Lab | ETH Zurich

PyTorch Computer Vision Library for Experts and Beginners (2020)

Computer Vision Pretrained Models

Fawkes: Image “Cloaking” for Personal Privacy (HN)

Motion - Software motion detector.

Supervised 3D Mesh Reconstruction (2020)

NeRF in the Wild - Neural Radiance Fields for Unconstrained Photo Collections.

NASA: Neural Articulated Shape Approximation (2020)

An Overview of Deep Learning Architectures in Few-Shot Learning Domain (2020)

FutureMapping: The Computational Structure of Spatial AI Systems (2018) (Tweet)

Optimal Peanut Butter and Banana Sandwiches (2020) (Twitter)

Gesture Recognition with Line Integrals (Code)

Computer Vision: Looking Back to Look Forward (2020)

DAIN (Depth-Aware Video Frame Interpolation)

Picsellia - Development platform dedicated to Computer Vision.

Official implementation of "PifPaf: Composite Fields for Human Pose Estimation" in PyTorch

Object Recognition with Gradient-Based Learning (1999)

Imaginaire - NVIDIA PyTorch GAN library with distributed and mixed precision support. (Docs)

DeepBackSub - Virtual Video Device for Background Replacement with Deep Semantic Segmentation.

Awesome Tiny Object Detection

Flow-edge Guided Video Completion

5 Things to look for in a Computer Vision startup job (2020)

Transformers for Image Recognition at Scale (2020) (HN)

nnU-Net - Segmentation method that is designed to deal with the dataset diversity.

batchgenerators - Framework for data augmentation for 2D and 3D image classification and segmentation.

Lookuq - App to create object detection projects without coding. (HN)

InsightFace - Face Analysis Project on MXNet. (Web)

PyTorch implementation of SwAV (Swapping Assignments between Views)

Asymmetric Loss For Multi-Label Classification in PyTorch

Antialiased CNNs - Making Convolutional Networks Shift-Invariant Again.

Perceptual Similarity Metric and Dataset - Unreasonable Effectiveness of Deep Features as a Perceptual Metric.

Deep Learning Anime Papers

Vision Transformer - Models from the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.

Handsfree.js - Wrapper library around computer vision models for working with face pointers, assistive tech, and creative expression. (Web)

ZeroQ: A Novel Zero Shot Quantization Framework

SqueezeNext - Contains the Caffe implementation of SqueezeNext.

ANODE: Adjoint Based Neural ODEs

Python Video Stabilization using OpenCV

Recent Advances in Vision and Language PreTrained Models (VL-PTMs)

TorchCV - PyTorch vision library mimics ChainerCV.

Vision Transformer in PyTorch

MedicalTorch - Medical imaging framework for PyTorch. (Docs)

imagecluster - Cluster images based on image content using a pre-trained deep neural network, optional time distance scaling and hierarchical clustering.

Detecto - Build fully-functioning computer vision models with PyTorch. (Docs)

EmoPy - Deep neural net toolkit for emotion analysis via Facial Expression Recognition (FER).

PyTorch Implementation of "NVAE: A Deep Hierarchical Variational Autoencoder"

Label Decoupling Framework for Salient Object Detection

MONAI - PyTorch-based, open-source framework for deep learning in healthcare imaging, part of PyTorch Ecosystem. (Web)

Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection

Faster R-CNN Explained for Object Detection Tasks (2020)

How to Install OpenCV on a Raspberry Pi (2020)

Contextual Encoder-Decoder Network for Visual Saliency Prediction

PyImageSearch - Master Computer Vision, Deep Learning, and OpenCV.

Natural Adversarial Examples - Harder ImageNet Test Set.

How to upload 50 OpenCV frames into cloud storage within 1 second (2020)

Egocentric Videoconferencing (2020) - Method for egocentric videoconferencing that enables handsfree video calls, for instance by people wearing smart glasses or other mixedreality devices. (Video overview)

gradslam - Open source differentiable dense SLAM library for PyTorch.

High-Resolution Daytime Translation Without Domain Labels

Holistically-Nested Edge Detection

pycls - Image classification codebase, written in PyTorch.

PyTorch implementation of High-Fidelity Generative Image Compression + Routines for neural image compression

How Useful is Self-Supervised Pretraining for Visual Tasks?

PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models

InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image

Multi-object trackers in Python - Easy to use implementation of various multi-object tracking algorithms.

Stanford Vision and Learning Lab (GitHub)

Learning computer vision. Overview of methods and software (2018)

Image embeddings. Image similarity and building (2020) (Code)

All You Need to Know About Object Detection Systems (2020)

Lightly - Computer vision framework for self-supervised learning.

DISK: Learning local features with policy gradient (2020) (Code)

Caer - Lightweight Computer Vision library for high-performance AI research. (Intro)

Awesome Image to Image Translation Papers

EfficientDet: Scalable and Efficient Object Detection, in PyTorch

UNet: semantic segmentation with PyTorch

Exploring Simple Siamese Representation Learning (2020) (Code) (Code)

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

Nerfies: Deformable Neural Radiance Fields (Code)

Timeception for Complex Action Recognition (2019) (Code)

Programming Computer Vision with Python (2014) (Code) (Notes)

Fast and Accurate One-Stage Space-Time Video Super-Resolution (2020)

pixelNeRF: Neural Radiance Fields from One or Few Images (2020) (Code)

vedadet - Single stage object detector toolbox based on PyTorch.

OneNet: End-to-End One-Stage Object Detection by Classification Cost

Consistent Video Depth Estimation - Estimate dense, flicker-free, geometrically consistent depth from monocular video, for example hand-held cell phone video.

Implicit Neural Representations with Periodic Activation Functions

Computational Imaging Stanford Lab

Trimap-Free Solution for Portrait Matting in Real Time

Local Light Field Fusion

Awesome Crowd Counting

Neural Sparse Voxel Fields (NSVF)

One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing (2020) (Tweet)

SharpAI DeepCamera - Source stack for machine learning engineering with private deployment and AutoML for edge computing. (HN)

Contrastive learning of global and local features for medical image segmentation with limited annotations

Real-Time High-Resolution Background Matting (2020) (Code)

Torchreid - Deep learning person re-identification in PyTorch.

Unsupervised Embedding Learning via Invariant and Spreading Instance Feature

img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation

SSD: Single Shot MultiBox Detector | a PyTorch Tutorial to Object Detection

PCT: Point Cloud Transformer (2020) (Code)

Learning Continuous Image Representation with Local Implicit Image Function (2020) (Code)

Computer Vision Annotation Tool (CVAT)

DeiT: Data-efficient Image Transformers

Awesome Implicit Neural Representations

ImageAI - Python library built to empower developers to build applications and systems with self-contained Computer Vision capabilities. (Web)

RAIVN Lab - Reasoning, AI and VisioN (RAIVN) Lab. (GitHub)

Norfair - Customizable lightweight Python library for real-time 2D object tracking.

Universal Style Transfer in PyTorch

NVIDIA Deep learning Dataset Synthesizer (NDDS)

Object Detection at 2530 FPS with TensorRT and 8-Bit Quantization (2020)

HTML4Vision - Simple HTML visualization tool for computer vision research.

Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders

Taming Transformers for High-Resolution Image Synthesis

X-Temporal - Easily implement SOTA video understanding methods with PyTorch on multiple machines and GPUs.

NanoDet - Super fast and lightweight anchor-free object detection model. Real-time on mobile devices.

PyTorch Image Models

Awesome Vision and Language - Curated list of awesome vision and language resources.

DropBlock: A regularization method for convolutional networks (2018) (Code)

Glasses - Compact, concise and customizable deep learning computer vision library. (Web)

Explorable Super Resolution (2019)

PySceneDetect - Python and OpenCV-based scene cut/transition detection program & library.

Best Practices for Building Computer Vision Models (2021)

TIDE - General Toolbox for Identifying Object Detection Errors.

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals (2020) (Code)

Unsplash Image Search - Search photos on Unsplash using natural language.

Kimera Semantics - Real-Time 3D Semantic Reconstruction from 2D data.

Voxblox++ - Volumetric object-level semantic mapping framework.

Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Surfaces (Code)

Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video (2020) (Code)

DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation (2019) (Code)

Awesome Neural Radiance Fields

D2Det: Towards High Quality Object Detection and Instance Segmentation (2020)

DetCo: Unsupervised Contrastive Learning for Object Detection (2021) (Code)

Computer Vision Video Lectures - Curated list of free, high-quality, university-level courses with video lectures related to the field of Computer Vision.

Cord - Training data toolbox for computer vision. (HN)

Text-Guided Editing of Images (Using CLIP and StyleGAN)

torchvision - Datasets, Transforms and Models specific to Computer Vision. (Web)

MeInGame: Create a Game Character Face from a Single Portrait (2021) (Code)

Awesome Deep Vision

dataset-tools - Tools for quickly normalizing image datasets.

Using Streamlit to visualize object detection output (2021)

Mobile Computer Vision @ Facebook

Opening the black box of vision AI algorithms (2021)

CompreFace - Free face recognition solution that can be easily integrated into any IT system without prior machine learning skills.

IBRNet: Learning Multi-View Image-Based Rendering (2021) (Code)

From Coarse to Fine: Robust Hierarchical Localization at Large Scale (2019) (Code)

Camera Response Function (2021)

I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image (2020) (Code)

SkipNet: Learning Dynamic Routing in Convolutional Networks (2018) (Code)

Mrcal - Camera Calibrations and More. (HN)

Digging Into Self-Supervised Monocular Depth Estimation (2019) (Code) (Code)

VISSL - FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images. (Web)

Zumo Labs - Generate custom synthetic data sets that result in more robust and reliable computer vision models. (GitHub)

Oriented Object Detection in Aerial Images with Box Boundary-Aware Vectors (2020) (Code)

Perceiver: General Perception with Iterative Attention (2021) (Code)

SEER: The start of a more powerful, flexible, and accessible era for computer vision (2021)

NerFACE: Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction (2021)

Neural 3D Video Synthesis

Involution: Inverting the Inherence of Convolution for Visual Recognition (2021) (Code)

Awesome Causality in Computer Vision

Vision Transformers for Dense Prediction (2021) (Code)

LoFTR: Detector-Free Local Feature Matching with Transformers (2021) (Code)

ccv - C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library.

Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes (2020) (Code)

AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control (2021) (Tweet)

Computer Vision and Embroidery (2021) (Code)

mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields (2021)

Python libraries I use every day for computer vision work (2021)

Awesome Temporal Sentence Grounding in Videos

The Affective Growth of Computer Vision

Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D (2020) (Code)

End-to-End Video Instance Segmentation with Transformers (2021) (Code)

SAHI: Slicing Aided Hyper Inference

FOVO: A new 3D rendering technique based on human vision (2020) (HN)

Is Space-Time Attention All You Need for Video Understanding? (2021) (Code)

Awesome Visual-Transformer - Transformer with Computer-Vision (CV) papers.

PyTorchVideo - Deep learning library for video understanding research. (Web)

Self-supervised Video Object Segmentation by Motion Grouping (2021) (HN) (Code)

torchvideo - Datasets, transforms and samplers for video in PyTorch.

A General and Adaptive Robust Loss Function (2019) (Code)

Self-supervising Fine-grained Region Similarities for Large-scale Image Localization (2020) (Code)

MaX-DeepLab: Dual-Path Transformers for End-to-End Panoptic Segmentation (2021)

Vizy - AI Camera.

MMPX Style-Preserving Pixel Art Magnification (2021) (HN)

Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion (Code)

SuperPoint: Self-Supervised Interest Point Detection and Description (2018) (Code)

Multi-Stage Progressive Image Restoration (2021) (Code)

COLMAP - General-purpose Structure-from-Motion (SfM) and Multi-View Stereo (MVS) pipeline with a graphical and command-line interface. (Docs)

Awesome Vision-based SLAM / Visual Odometry

Barlow Twins: Self-Supervised Learning via Redundancy Reduction (2021) (Code)

HIPCL - OpenCL/SPIR-V implementation of HIP.

MMCV - Foundational library for computer vision research and supports many research projects. (Docs)

MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding (2021) (Code)

Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples (2021) (Code) (Code)

Emerging Properties in Self-Supervised Vision Transformers (2021) (Code) (Tweet) (Tweet)

Geometry-Free View Synthesis: Transformers and no 3D Priors (2021) (Code)

Easily Transform Portraits of People into AI Aberrations Using StyleCLIP (2021)

DeepMetaHandles: Learning Deformation Meta-Handles of 3D Meshes with Biharmonic Coordinates (2021) (Code)

Onepanel - Open and extensible integrated development environment (IDE) for computer vision. (Web)

Vector Neurons: A General Framework for SO(3)-Equivariant Networks (2021) (Code)

ISTR: End-to-End Instance Segmentation with Transformers (2021) (Code)

MLP-Mixer: An all-MLP Architecture for Vision (2021) (Code) (Code)

Self-attention building blocks for computer vision applications in PyTorch

LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference

Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary (2021) (Web) (Code)

Neural Rendering: How Low Can You Go in Terms of Input? (2021)

Enhancing Photorealism Enhancement (2021) (Paper) (Code)

DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control (2021) (Code)

Omnimatte: Associating Objects and Their Effects in Video (2021)

Rethinking "Batch" in BatchNorm (2021)

UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation (2020) (Code)

Synthetic for Computer Vision - List of synthetic dataset and tools for computer vision.

vision_blender - Blender addon for generating synthetic ground truth data for Computer Vision applications.

Easy Few-Shot Learning - Ready-to-use code and tutorial notebooks to boost your way into few-shot image classification.

BasicSR (Basic Super Restoration) - Open source image and video restoration toolbox based on PyTorch, such as super-resolution, denoise, deblurring, JPEG artifacts removal, etc.

Intriguing Properties of Vision Transformers (2021) (Reddit)

DIY Amazon Go – computer vision tutorial for cashierless checkout

Image Retrieval in the Wild (2020)

Awesome Transformer in CV papers

Sensor Calibration from Scratch with Rust (2021)

Tangram Vision - Integrate, Calibrate Perception Sensors For Robots, Drones & Automation. (Blog)

Rust CV - Project to implement computer vision algorithms, abstractions, and systems in Rust.

Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control (2021) (HN)

Robust Instance Segmentation through Reasoning about Multi-Object Occlusion (2021) (Code)

MERLOT: Multimodal Neural Script Knowledge Models (2021) (Tweet)

Scaling Vision Transformers (2021)

Self-Supervised Scene De-occlusion (2020) (Code)

Pivotal Tuning for Latent-based Editing of Real Images (2021) (Code)

FLAME: Articulated Expressive 3D Head Model (Code)

XCiT: Cross-Covariance Image Transformers (2021) (Code)

Robust Consistent Video Depth Estimation (2021) (Code)

cvpods - All-in-one Toolbox for Computer Vision Research.

CDFI: Compression-Driven Network Design for Frame Interpolation (2021) (Code)

NeRF--: Neural Radiance Fields Without Known Camera Parameters (2021) (Code) (Code)

Oxford Active Vision Laboratory (GitHub)

Computer Vision: Algorithms and Applications, 2nd ed.

motionEyeOS - Linux distribution that turns your single board computer into a video surveillance system.

Long-Short Transformer: Efficient Transformers for Language and Vision (2021) (Code)

Feature Visualization – How NNs understand images (2017)

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis (2019) (Code)

Convolutional Hough Matching Networks (2021) (Code)

Efficient Self-Supervised Vision Transformers (EsViT)

ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases (2021) (Code) (Paper Read) (Article)

CO3D: Common Objects In 3D - Tools for working with the Common Objects in 3D (CO3D) dataset.

ORBIT: A Real-World Few-Shot Dataset for Teachable Object Recognition (2021) (Code)

Vision Transformer Architecture Search (2021) (Code)

TSIT: A Simple and Versatile Framework for Image-to-Image Translation (2020) (Code)

Recognizing People in Photos Through Private On-Device Machine Learning (2021)

CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation (2021) (Code)

HPNet: Deep Primitive Segmentation Using Hybrid Representations (2021) (Code)

Portal - Fastest way to load and visualize your deep neural networks on images and videos.

Awesome Human Pose Estimation

Learning A Single Network for Scale-Arbitrary Super-Resolution (2021) (Code)

PyTorch implementation for Vision Transformer

Repulsive Curves - Model 2D & 3D curves while avoiding self-intersection. (Tweet) (Code) (HN)

SDEdit: Image Synthesis and Editing with Stochastic Differential Equations (Code)

NeX: Real-time View Synthesis with Neural Basis Expansion (2021) (Code)

Convolutional Occupancy Networks (2020) (Code)

Learning Optical Flow from a Few Matches (2021) (Code)

Visual Parser: Representing Part-whole Hierarchies with Transformers (2021) (Code)

Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation (Code)

On Generating Transferable Targeted Perturbations (2021) (Code)

Awesome Scene Understanding - List of papers for scene understanding.

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation (2021) (Code)

DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks (2021) (Code)

Object Detection in an Hour (2021) (HN)

Fixing the train-test resolution discrepancy (2020) (Code)

Align Deep Features for Oriented Object Detection (2020) (Code)

Vision-Language Transformer and Query Generation for Referring Segmentation (2021) (Code)

Depth-supervised NeRF: Fewer Views and Faster Training for Free (2021) (Code)

SwinIR: Image Restoration Using Swin Transformer (2021) (Code)

You Only Learn One Representation: Unified Network for Multiple Tasks (2021) (Code)

Probabilistic Modeling for Human Mesh Recovery (2021) (Code)

BARF: Bundle-Adjusting Neural Radiance Fields (2021) (Code)

Self-Calibrating Neural Radiance Fields (2021) (Code)

Transformers-Tutorials - Demos I made with the Transformers library by HuggingFace.

3D Human Texture Estimation from a Single Image with Transformers (2021) (Code)

CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval (2021) (Code)

RAFT: Recurrent All Pairs Field Transforms for Optical Flow (2020) (Code)

Volume rendering + 3D implicit surface = Neural 3D Reconstruction

Hierarchical Deep Stereo Matching on High-resolution Images (2019) (Code)

Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering (2021) (Code)

Image Synthesis via Semantic Composition (2021) (Code)

Awesome-Edge-Detection-Papers

Awesome-Image-Colorization

Learning A Single Network for Scale-Arbitrary Super-Resolution (2021) (Code)

Face Recognition - 2D and 3D Face alignment library build using PyTorch.

Awesome image retrieval papers

PeekingDuck - Modular framework built to simplify Computer Vision inference workloads.

Pri3D: Can 3D Priors Help 2D Representation Learning? (2021) (Code)

FaceXLib - Aims at providing ready-to-use face-related functions based on current STOA open-source methods.

MMAction2 - Open-source toolbox for video understanding based on PyTorch.

Awesome Collision Detection

Video Super-Resolution Transformer (2021) (Code)

NeRF Atlas - Collection of NeRF extensions for fun and experimentation.

Training and testing codes for USRNet, DnCNN, FFDNet, SRMD, DPSR, MSRResNet, ESRGAN, BSRGAN, SwinIR

Uformer: A General U-Shaped Transformer for Image Restoration (2021) (Code) (Code)

Self-Supervised Pretraining Improves Self-Supervised Pretraining (2021) (Code)

SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes (2021) (Code)

HRFormer: High-Resolution Transformer for Dense Prediction, NeurIPS 2021

IceVision - Agnostic Computer Vision Framework - Pluggable to any Training Library: Fastai, Pytorch-Lightning with more to come. (Docs)

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks (2021) (Tweet)

Attention Gated Networks (Image Classification & Segmentation) in PyTorch

Full-Duplex Strategy for Video Object Segmentation (2021) (Code)

YoHa - Practical hand tracking engine. (HN) (Code)

Deep Learning for Face Anti-Spoofing: A Survey (2021) (Code)

A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation (2021) (Code)

Resolution-robust Large Mask Inpainting with Fourier Convolutions (2021) (Code)

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows (2021) (Code) (Code)

ADOP: Approximate Differentiable One-Pixel Point Rendering (2021) (Tweet) (Tweet) (Code)

Patches Are All You Need? (2021) (Code)

ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation (2020) (Code)

Video Panoptic Segmentation (2020) (Code)

Awesome-ICCV2021-Low-Level-Vision - Collection of Papers and Codes for ICCV2021 Low Level Vision and Image Generation.

Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts (2021) (Code)

Non-deep Networks (2021) (Code)

receptivefield - Gradient based receptive field estimation for Convolutional Neural Networks.

Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations (2021) (Code)

Neural Articulated Radiance Field (2021) (Code)

Efficient Visual Pretraining with Contrastive Detection (2021) (Code)

VoTT (Visual Object Tagging Tool) - Source annotation and labeling tool for image and video assets.

FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes (2021) (Code)

ByteTrack: Multi-Object Tracking by Associating Every Detection Box (2021) (Code)

Dense Video Captioning with Bi-modal Transformer (2020) (Code)

PyTorch-Encoding - CV toolkit for my papers. (Docs)

Space Time Recurrent Memory Network (2021) (Code)

CVNets - Library for training computer vision networks.

Scenic - Jax Library for Computer Vision Research and Beyond. (Paper)

CV Arxiv Daily (Code)

OpenVisionCapsules - Set of libraries for encapsulating smart vision algorithms.

MedMNIST: Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification (Code)

Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language (2021) (Code)

Neural-Pull: Learning Signed Distance Functions from Point Clouds by Learning to Pull Space onto Surfaces (2021) (Code)

The 2021 Image Similarity Dataset and Challenge (2021) (Code)

K-Net: Towards Unified Image Segmentation (2021) (Code)

Yolov5 + Deep Sort with PyTorch

Shape As Points: A Differentiable Poisson Solver (2021) (Code)

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm (2021) (Code)

Awesome Vision-Language Navigation

An Exploration of Embodied Visual Exploration (2021) (Code)

DVC: An End-to-end Deep Video Compression Framework (2019) (Code)

Pixray - Neural image generation.

Unsupervised Learning of Compositional Energy Concepts (2021) (Tweet)

Learning with Noisy Labels for Robust Point Cloud Segmentation (2021) (Code)

Kalidoface - Become a virtual character with just your webcam. (Web)

KalidoKit - Face, Pose, and Hand Tracking Kinematics.

The Ancient Secrets of Computer Vision

Unsupervised Real-world Image Super Resolution via Domain-distance Aware Training (2020) (Code)

PyGaze - Open source eye-tracking software and more. (HN)

Exploring Relational Context for Multi-Task Dense Prediction (2021) (Code)

Neural Scene Graphs for Dynamic Scenes (2021) (Code)

Image Super-Resolution via Iterative Refinement (HN) (Code)

UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning (2021) (Code)

Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers (2021) (Code)

Multimodal Virtual Point 3D Detection (2021) (Code)

SiT: Self-supervised vIsion Transformer

Attention Mechanisms in Computer Vision: A Survey (2021)

Awesome Vision Attention Papers

FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation (2021) (Code)

RenderNet: A deep convolutional network for differentiable rendering from 3D shapes (2018) (Code)

Masked Autoencoders Are Scalable Vision Learners (2021) (Code) (Code) (Code)

BoostingMonocularDepth

It's About Time: Analog Clock Reading in the Wild (2021) (Tweet) (Code)

Learning to Compose Visual Relations (2021) (Code)

LF-Net: Learning Local Features from Images (2018) (Code)

Aligning Pretraining for Detection via Object-Level Contrastive Learning (2021) (Code)

Look at the Variance! Efficient Black-box Explanations with Sobol-based Sensitivity Analysis (2021) (Code)

Deep unfolding network for image super-resolution (2020)

VOLO: Vision Outlooker for Visual Recognition (2021) (Code)

Direct Multi-view Multi-person 3D Pose Estimation (2021) (Code)

Image2Mesh: A learning framework for single image 3D reconstruction (2019) (Code)

GammaCV - WebGL accelerated Computer Vision library for modern web applications. (Web)

Localizing Objects with Self-Supervised Transformers and no Labels (2021) (Code)

Harvester - GenICam-based Image Acquisition Python Library.

NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion (2021) (Code) (PyTorch Code)

ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision (2021) (Code)

MetaFormer is Actually What You Need for Vision (2021) (Code)

ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators (2021) (Code)

Mesa: A Memory-saving Training Framework for Transformers (2021) (Code)

MMPose - Open-source toolbox for pose estimation based on PyTorch. (Docs)

An Empirical Study of Training End-to-End Vision-and-Language Transformers (2021) (Code)

Useful computer vision PhD resources

Tenyks - Data-centric Computer Vision.

Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation (2021) (Code)

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields (2021) (Code)

Learning to See by Looking at Noise (2021) (Code)

iBOT: Image BERT Pre-Training with Online Tokenizer (2021) (Code)

Grounded Language-Image Pre-training (2021) (Code)

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction (2016) (Code)

Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks (Code)

Awesome Visual Grounding

Are Transformers More Robust Than CNNs? (2021) (Code)

Plenoxels: Radiance Fields without Neural Networks (2021) (Code) (Code)

GFPGAN - Developing Practical Algorithms for Real-world Face Restoration.

Awesome Video Stabilization

MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo (2021) (Code)

Tracking People with 3D Representations (2021) (Code)

Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection (2019:) (Code)

Learning to Stylize Novel Views (2021) (Code)

YOLOX - High-performance anchor-free YOLO. (Docs)

PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop (2021) (Code)

SeqFormer: a Frustratingly Simple Model for Video Instance Segmentation (2021) (Code)

NeRD: Neural Reflectance Decomposition from Image Collections (2021) (Code)

Vector Quantized Diffusion Model for Text-to-Image Synthesis (2021) (Code) (Code) (Code)

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models (2021) (Code)

SynthDet - End-to-end object detection pipeline using synthetic data.

MPViT: Multi-Path Vision Transformer for Dense Prediction (2021) (Code)

StyleSwin: Transformer-based GAN for High-resolution Image Generation (2021) (Code)

Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline (2021) (Code)

SLIP: Self-supervision meets Language-Image Pre-training (2021) (Code)

General Facial Representation Learning in a Visual-Linguistic Manner (2021) (Code) (Code)

HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields (Code) (HN)

Learning to Regress Bodies from Images using Differentiable Semantic Rendering (2021) (Code)

High-Resolution Image Synthesis with Latent Diffusion Models (2021) (Code)

Photorealistic Audio-driven Video Portraits (2020) (Code)

Awesome Hand Pose Estimation

Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers (2021) (Code)

Transformer Interpretability Beyond Attention Visualization (2021) (Code)

StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Synthesis (2021) (Code)

Light Field Image Super-Resolution with Transformers (2021) (Code)

Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes (2021) (Code)

DeepSIM: Image Shape Manipulation from a Single Augmented Training Sample (2021) (Code)

RAFT-3D: Scene Flow using Rigid-Motion Embeddings (2021) (Code)

Unsupervised Indoor Depth Estimation (2020) (Code)

A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose (2021) (Code)

Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective (2021) (Code)

Sara - Easy-to-Use C++ Computer Vision Library.

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching (2021) (Code)

U-2-Net: Going Deeper with Nested U-Structure for Salient Object Detection (2020) (Code)

Language as Queries for Referring Video Object Segmentation (2022) (Code)

Localization with Sampling-Argmax (2021) (Code)

VOCA: Voice Operated Character Animation (Code)

CVZone - Computer vision package that makes its easy to run Image processing and AI functions.

Deepface - Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python.

Location-aware Single Image Reflection Removal (2021) (Code)

MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement (2021) (Code)

Detecting Twenty-thousand Classes using Image-level Supervision (2022) (Code)

Language-driven Semantic Segmentation (2022) (Code)

Rethinking Nearest Neighbors for Visual Classification (2021) (Code)

Vision Transformer with Deformable Attention (2022) (Code) (Code)

KerasCV - Industry-strength Computer Vision workflows with Keras.

Instant Neural Graphics Primitives - Lightning fast NeRF and more.

Dynamic Head: Unifying Object Detection Heads with Attentions (2021) (Code)

ELSA: Enhanced Local Self-Attention for Vision Transformer (2021) (Code)

FFCV - Fast Forward Computer Vision (and other ML workloads!) (Web)

Awesome Vit - Curated list and survey of awesome Vision Transformers.

Instant Neural Graphics Primitives with a Multiresolution Hash Encoding (2022) (Code) (Code) (Video Summary) (HN)

Road Extraction by Deep Residual U-Net (2017) (Code)

Single-Stage 6D Object Pose Estimation (2019) (Code)

Visual Task Adaptation Benchmark (VTAB)

TAda! Temporally-Adaptive Convolutions for Video Understanding (2022) (Code)

UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction (2021) (Code)

Co-Fusion: Real-time Segmentation, Tracking and Fusion of Multiple Objects (2020) (Code)

VRT: A Video Restoration Transformer (2021) (Code)

Unknown Object Segmentation from Stereo Images (2021) (Code)

Stacked Cross Attention for Image-Text Matching (2018) (Code)

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation (2022) (Code)

DeFlow: Learning Complex Image Degradations from Unpaired Data with Conditional Flows (2021) (Code)

DocFormer: End-to-End Transformer for Document Understanding (2022) (Code)

SeMask: Semantically Masked Transformers for Semantic Segmentation (2021) (Code)

Image Quality Assessment: Unifying Structure and Texture Similarity (2020) (Code)

Learning Super-Features for Image Retrieval (2022)

YOLOv7 - Framework Beyond Detection.

A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model (2021) (Code)

Single/Multiple Object Tracking and Segmentation

Learnable Multi-level Frequency Decomposition and Hierarchical Attention Mechanism for Generalized Face Presentation Attack Detection (2021) (Code)

HifiFace: 3D Shape and Semantic Prior Guided High Fidelity Face Swapping (2021) (Code)

Scalable Large Scene Neural View Synthesis (2022) (HN)

Transformer Recipe - Quick recipe to learn all about Transformers.

NeROIC: Neural Rendering of Objects from Online Image Collections (2022) (Code)

DiffusionNet: Discretization Agnostic Learning on Surfaces (2022) (Code)

FILM: Frame Interpolation for Large Motion (2022) (Code)

Learning Signed Distance Field for Multi-view Surface Reconstruction (2021) (Code)

Deep Metric Learning in PyTorch

ICON: Implicit Clothed humans Obtained from Normals (2021) (Code)

CLIPasso: Semantically-Aware Object Sketching (2022) (Code)

BANMo: Building Animatable 3D Neural Models from Many Casual Videos (2022) (Code)

How Do Vision Transformers Work?

Top 10 Computer Vision Papers of 2021

Exploring Sparsity in Image Super-Resolution for Efficient Inference (2021) (Code)

AutoInt: Automatic Integration for Fast Neural Volume Rendering (2021)

Learning to Prompt for Vision-Language Models (2021) (Code)

Summarizing Videos with Attention (2019) (Code)

vkit - Toolkit designed for CV (Computer Vision) developers. (Docs)

Generative Adversarial Graph Convolutional Networks for Human Action Synthesis (2021) (Code)

Awesome Image Matting

Image-to-Markup Generation with Coarse-to-Fine Attention (Code)

Push-ups with Python, mediapipe and OpenCV (HN)

Lama-cleaner: Image inpainting tool powered by LaMa

Vision-Language Pre-Training with Triple Contrastive Learning (2022) (Code)

3D Machine Learning resources/papers

FiftyOne - Open-source tool for building high-quality datasets and computer vision models.

Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut (2022) (Code)

Awesome Multiple object Tracking

Rethinking Coarse-to-Fine Approach in Single Image Deblurring (2021) (Code)

Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling (2021) (Code)

As-ViT: Auto-scaling Vision Transformers without Training (2022) (Code)

Awesome 3D Body Papers

RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth (2021) (Code)

Image Similarity Challenge

Blended Diffusion for Text-driven Editing of Natural Images (2021) (Code)

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization (2021) (Code)

Awesome Object Pose

Video Enhancement papers/resources

PowerQE: An Open Framework for Quality Enhancement of Compressed Visual Data

Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels (2022) (Code)

Accurate Image Alignment and Registration Using OpenCV (2022) (HN)

Video Grounding and Captioning

Awesome Detection Transformer

StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis (2021) (Code) (Web) (HN)

Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition (2020) (Code)

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation (2021) (Code)

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection (2022) (Code)

Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation (2022)

CycleMLP: A MLP-like Architecture for Dense Prediction (2022) (Code)

Image Quality Assessment Benchmark

StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation (2021) (Code)

Transformers, originally designed to handle language, are taking on vision (2022) (HN)

Fast Image Processing with Fully-Convolutional Networks (2017) (Code)

Efficient Attention: Attention with Linear Complexities (2020) (Code)

Label-Efficient Semantic Segmentation with Diffusion Models (2022) (Code)

hloc - Modular toolbox for state-of-the-art 6-DoF visual localization.

All Tokens Matter: Token Labeling for Training Better Vision Transformers (2021) (Code)

Deformable ConvNets v2: More Deformable, Better Results (2018) (Code)

Restormer: Efficient Transformer for High-Resolution Image Restoration (2021) (Code)

Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice (2022) (Code)

NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video (2021) (Code)

Awesome 3D Human Reconstruction

Awesome 3D Human Resources List

A ConvNet for the 2020s (2022) (Code) (Code)

Remote-sensing-image-semantic-segmentation - Uses Unet-based improved networks to study Remote sensing image semantic segmentation, which is based on keras.

Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies (2021) (Code)

TensoRF: Tensorial Radiance Fields (2022) (Code)

Autoregressive Image Generation using Residual Quantization (2022) (Code) (Code)

Pix2Pix Timbre Transfer

One-Shot Adaptation of GAN in Just One CLIP (2022) (Code)

PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds (2021) (Code)

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training (2022) (Code)

Awesome Masked Image Modeling

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training (2022) (Code)

A Transformer-Based Siamese Network for Change Detection (2022) (Code)

Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition (2021) (Code)

Robust fine-tuning of zero-shot models (2022) (Code)