DevOps
I think Railway is amazing example of doing Devops right. Porter & Backstage are nice too.
Google SRE Book is great. Airplane is nice for exposing common commands for all in team to use.
Notes
Links
- Ask HN: What is the fastest way to ramp up on DevOps, k8 and GCP? (2021)
- DevOps, SRE, and Platform Engineering (2021)
- We're Reddit's Infrastructure team, ask us anything! (2018)
- Vercel - Develop. Preview. Ship. (Web)
- Now Examples - Examples of Now deployments you can use.
- I forgot how to manage a server (2019) (HN)
- Applikatoni - Self-hosted deployment server for your team.
- Lobsters: What’s your container-less deployment process? (2019)
- A developer goes to a DevOps conference (2019) (HN)
- Deploy your side-projects at scale for basically nothing - Google Cloud Run (2020) (HN)
- DevOps Questions & Exercises
- Ops Lessons We All Learn The Hard Way (2020)
- Juju - Simple, secure devops tooling built to manage today's complex applications wherever you run your software. (Web)
- Book Recommendations for the Infrastructure Engineer
- Ask HN: How do you make sure your servers are up as a single founder? (2020)
- CTO.ai - Allows you and your software development team to implement DevOps automations in minutes rather than days.
- Deploys at Slack (2020)
- We Need DevOps for ML Data (2020) (HN)
- Awesome Pipeline - Curated list of awesome pipeline toolkits inspired by Awesome Sysadmin.
- Awesome Sysadmin - Curated list of awesome open source sysadmin resources.
- Using SRE to meet reliability challenges | Google Cloud (2020)
- Gruntwork - DevOps as a Service.
- pyinfra - Automates infrastructure super fast at massive scale. It can be used for ad-hoc command execution, service deployment, configuration management and more. (HN)
- Testinfra - Write unit tests in Python to test actual state of your servers configured by management tools like Salt, Ansible, Puppet, Chef and so on.
- PagerDuty Incident Response Documentation (Code)
- Building an online community around learning from incidents (2019) (HN)
- The Rise of Platform Engineering (2020) (HN)
- How we monitor our services at SourceHut (2020)
- Reference checklist for going to production
- Revolv - Create a complete cloud architecture on your Amazon Web Services, Google Cloud Platform or Microsoft Azure account. (HN)
- Clutch - Extensible platform for infrastructure management. (Announcement)
- What is DevOps? (2020)
- Sysdig - Security, Compliance & Performance for your Devops Workflows.
- A List of Skills and PracticesWe Use to Train Our DevOps Internally (2020)
- Bridgecrew - Codified cloud security for DevOps. (GitHub)
- You Reap What You Code (2020)
- How we use HashiCorp Nomad (2020) (HN)
- Ask HN: Has anyone moved from Kubernetes to Nomad? (2020)
- Qovery - Deploy your apps on any Cloud providers in just a few seconds. (Web)
- packagecloud - Private NPM registry and Maven, RPM, DEB, PyPi and RubyGem Repository.
- Gravitational - Remote Access and Secure Deployments.
- DeployHQ - Automatically build and deploy code from your repositories.
- Cooking Infrastructure by Chef (Code)
- Unleash - Open source feature toggle service. (Code) (GitHub)
- The golden age of configuration languages (2020) (HN)
- School of SRE (Code)
- Christine Dodrill: ex-SRE, Lightspeed (2020)
- driftctl - Detect, track and alert on infrastructure drift. (Code)
- Shipyard - Modern cloud native development environments. (Web)
- FAUN - DevOps community.
- DevOps Maturity Framework
- Bitnami - Packaged Applications for Any Platform - Cloud, Container, Virtual Machine. (GitHub)
- Bitnami Library for Kubernetes
- Kira - Project management framework with deep philosophy underneath.
- Site Reliability Engineer Interview Preparation Guide
- fastlane - App automation done right. (Code)
- List of Devops Resources
- werf - Git as a single source of truth. Build. Deploy to Kubernetes. Stay in sync. (Web)
- Zero-downtime deploys with DigitalOcean, GitHub, and Docker (2021)
- Running Nomad for home server (2021) (Lobsters) (HN)
- They SRE - Curated Collection on Site Reliability Engineering.
- DevOps Resources
- We are far from a better Heroku for production apps in a hyper cloud (2021) (HN)
- coolify - Open-source, self-hostable Heroku and Netlify alternative. (Code) (HN)
- CloudARK - Platform-As-Code. (GitHub)
- Meltano - ELT for the DataOps era. (Code)
- DigitalOcean Agent - Collects system metrics from DigitalOcean Droplets.
- Pulumi - Modern Infrastructure as Code. Any cloud, any language. (Code) (HN) (HN 2) (Awesome)
- Piku - Tiniest PaaS you've ever seen. Piku allows you to do git push deployments to your own servers. (GitHub)
- Awesome Incident Response
- Fleet - Open source device management. (Code)
- Reliably CLI - Reliability as Code: SRE automation at the tip of your fingers. (Web)
- To PaaS or not (2021)
- SRE at Google: Our complete list of CRE life lessons (2021)
- Bad Machinery: Managing Interrupts Under Load
- Securing DevOps: Security in the Cloud (2018)
- Craft - Universal Release Tool (And More).
- DevOps Cheat Sheets (Code)
- MegaEase - High Performance Software Architecture. (GitHub)
- Erda - Enterprise-grade application building, deploying, monitoring platform.
- DevOps Engineering Course for Beginners (2021)
- How to improve your website’s uptime (2021)
- Peanut - Deploy Databases and Services Easily for Development and Testing Pipelines. (Web)
- DevOps Engineer Crash Course (2021)
- Artillery.io - Modern load testing & smoke testing for SRE and DevOps. (Code)
- Top-10 talks of SREcon18 Europe (2018)
- The DevOps: A Concise Understanding to the DevOps Philosophy and Science. (Technical Report) (2021)
- Cachito - Caching service for source code and external dependencies.
- envsafe - Makes sure you don't accidentally deploy apps with missing or invalid environment variables.
- Uptime Kuma - Fancy self-hosted monitoring tool. (HN)
- Ask HN: Solo-preneurs, how do you DevOps to save time? (2021)
- How to Use Hydra as your Deployment Source of Truth (2021) (Lobsters)
- What to Ask in an SRE Technical Interview (2021)
- DevOps Newsletters of Note
- batou - Helps you to automate your application deployments using Python DSL. (Docs)
- Smallstep - Automated Certificate Management for DevOps. (GitHub)
- Learn-by-Doing Platforms for Dev, DevOps, and SRE Folks (2021)
- StackStorm - Platform for integration and automation across services and tools, taking actions in response to events. (Code)
- Grafana OnCall - Easy-to-use on-call management tool. (HN)
- Ironic - Service for managing and provisioning Bare Metal servers.
- Scaled Agile DevOps Maturity Framework - Enterprise transformation without the risk of culture change.
- Plunder - Single-binary server that is all designed in order to make the provisioning of servers, platforms and applications easier.
- Equinix Metal Images
- Cloud Droid - Cloud Incident and Response Simulations.
- The Reports of Devops's death are greatly exaggerated (2021)
- hcltm - Threat Modeling with HCL.
- Hyperping - Uptime monitoring with public status pages.
- hashi-up - Bootstrap HashiCorp Consul, Nomad, or Vault over SSH < 1 minute.
- faas-nomad - OpenFaas provider for Nomad.
- A Multi Cluster and Multi Orchestrator home lab (2021)
- DevOps in academic research (2021)
- Hetzner Pulumi Intro (2021)
- The Operator Pattern in Nomad (2021)
- Dev Lake - Brings all your DevOps data into one practical, personalized, extensible view. Ingest, analyze, and visualize data.
- Fastly Resource Provider
- OOPS (Learning from the incident you didn't have) writeup template (2021)
- Awesome DevOps
- Ultimate DevSecOps library
- Common Infrastructure Errors I've Made (2021) (Lobsters) (HN)
- Lightweight Experiment & Resource Monitoring
- Howie: The Post-Incident Guide
- Jeli - Dedicated Incident Analysis Platform.
- Zero - Opinionated infrastructure to take you from idea to production on day one. (Code)
- ClusterDev - Cloud Management and Automation Framework. (Code)
- Deployment from Scratch - Complete guide to web application deployment. (HN) (One year of sales)
- Awesome Event IDs - Collection of Event ID resources useful for Digital Forensics and Incident Response.
- Cloudkeeper - “housekeeping for clouds” - find leaky resources, manage quota limits, detect drift and clean up.
- Atomist - Keep Your Containerized Applications Safe. (GitHub)
- UpCheck - Declarative checker for website uptime to run continuously for monitoring.
- GRR - Incident response framework focused on remote live forensics.
- OWASP DevSecOps Guideline - Can help us to embedding security as a part of the development pipeline.
- DevOps by Example
- Brev.dev - Your local-only cloud computer. (CLI)
- Goss - Quick and Easy server testing/validation.
- FeatureHub - Cloud native feature flags, A/B testing and remote configuration service. (Code)
- waifud - Few tools to help me manage and run virtual machines across a homelab cluster. (Progress Report)
- Delayed Job vs. Sidekiq (2022) (HN)
- Cincinnati - Update protocol designed to facilitate automatic updates.
- Ministry of Justice Modernization Platform - Defined and managed in Terraform.
- fw - Workspace productivity booster.
- Motive - Programmable Task runner built with Rust and uses a special version of Lua. (Reddit)
- DevStream - Open-source DevOps toolchain manager (DTM).
- Opta - Infrastructure-As-Code framework where you work with high-level constructs instead of getting lost in low level cloud configuration.
- Yaru - Command line tool that manages simple tasks.
- Site Reliability Engineering University
- EaseProbe - Simple, standalone, and lightWeight tool that can do health/status checking, written in Go.
- Porter - Enables you to package your application artifact, client tools, configuration and deployment logic together as a versioned bundle that you can distribute, and install with a single command. (Code)
- Fiberplane - Collaborative notebooks for debugging your incidents. (GitHub)
- Fundamentals & Deployment (2022) (HN)
- Entropy - Framework to safely and predictably create, change, and improve modern cloud applications and infrastructure using familiar languages, tools, and engineering practices.
- Firefly - Bring your cloud up-to-code. (GitHub)
- bldr - Tool to build and package software distributions. Build process runs in buildkit (or docker buildx), build result can be exported as container image.
- Boost Note - Document driven project management tool that maximizes remote DevOps team velocity. (Code)
- arx - Bundles code and a job to run for local or remote execution.
- Open Build Service - Generic system to build and distribute binary packages from sources in an automatic, consistent and reproducible way. (CLI)
- JReleaser - Release projects quickly and easily. (Web)
- 90 Days of DevOps
- Glances - Cross-platform monitoring tool which aims to present a large amount of monitoring information through a curses or Web based interface.
- OpenStack Glance - OpenStack project that provides services and associated libraries to store, browse, share, distribute and manage bootable disk images.
- Regula - Tool that evaluates infrastructure as code files for potential AWS, Azure, Google Cloud, and Kubernetes security and compliance violations prior to deployment. (Docs)
- Awesome Site Reliability Engineering Tools
- SRE Cheat Sheet
- Massdriver - Effortless DevOps.
- Checkup - Gather static analysis insights for your projects.
- Gatus - Automated service health dashboard. (Web)
- Lightweight Cluster/Cloud VM Job Management
- Echoes HQ - Developer-friendly activity reports.
- Gasper - Intelligent Platform as a Service (PaaS) used for deploying and managing applications and databases in any cloud topology.
- Post-Incident Review on the Atlassian April 2022 Outage (Lobsters) (HN)
- Founding Uber SRE (HN)
- DevSecOps Playbook
- How we deploy to production over 100 times a day (2022)
- Release - Minimalistic, opinionated, and predictable release automation tool.
- StatusBase - Uptime monitoring tool & beautiful status pages.
- Dagu - Self-contained, standalone No-code workflow executor that runs DAGs defined in a simple, declarative YAML format that is similar to GitHub Actions or Argo Workflows with built-in Web UI.
- atmos - Universal Tool for DevOps and Cloud Automation (works with terraform, helm, helmfile, etc). (Guide)
- Delivery CLI - Command line tool for the workflow capabilities in Chef Automate.
- Spin Cycle - Automate and expose complex infrastructure tasks to teams and services.
- A review of Accelerate: The Science of Lean Software and DevOps (2022)
- sake - Command runner for local and remote hosts.
- Nomad Helper - Useful tools for working with Hashicorp Nomad at scale.
- Interval - Batteries-included approach to building rich internal tools directly in your app’s backend codebase. (Twitter) (Explained)
- Wander - Terminal application for Nomad by HashiCorp.
- Monitoring tiny web services (2022) (Lobsters) (HN)
- Updatecli - GitDevOps Automation Engine.
- Flow Distributed Workflow System - Provides a GRPC API that is used by clients to submit and manage workflows. (Docs)
- Tactical RMM - Remote monitoring & management tool, built with Django, Vue and Go.
- Using BSD make (Lobsters)
- What are the best books or resources on SRE automation that are also practical? (2022)
- Superblocks - IDE for Internal Apps, APIs and Cron Jobs. (HN)
- Multy - Easily deploy multi cloud infrastructure. Write cloud-agnostic config deployed across multiple clouds. (Web)
- Who's NOT using Kubernetes these days and want to share their exciting bit/tooling? (2022)