code.dblock.org | tech blog

2026-01-15T00:00:00+00:00

Dries Buytaert recently wrote about The Third Audience. For decades, websites have targeted two audiences: humans and search engines. AI agents are now the third audience, and most websites aren’t optimized for them yet.

AI agents prefer clean, structured content over HTML. Markdown is ideal - it’s readable, semantic, and free of navigation chrome. So I made this blog serve its source markdown files alongside the HTML.

How It Works

For every post like /2026/01/15/serving-markdown-for-ai-agents.html, you can now fetch the source at /2026/01/15/serving-markdown-for-ai-agents.md.

AI agents can discover this via a <link> tag in the HTML head:

<link href="serving-markdown-for-ai-agents.md"
      type="text/markdown"
      rel="alternate"
      title="Markdown">

Implementation

Since this is a GitHub Pages blog, custom Jekyll plugins don’t work. Instead, I added a GitHub Actions workflow that builds Jekyll and copies the markdown source files to _site before deployment.

The workflow extracts the date and slug from each post filename and copies it to the matching URL path with a .md extension.

Should You Do This?

Dries raises a valid concern: are we just making it easier for AI companies to use our content without sending traffic back? Perhaps. But AI agents are already crawling our sites. Giving them cleaner input might lead to better attribution and more accurate responses that reference our work.

The web has always been about making information accessible. This is just the next evolution.

Plus, we will soon all work for AI anyway, so might as well get on its good side.

Identify the individual competent to make a decision

A decision requires clarity about who has the authority and expertise to make the final decision. It’s best to identify this person upfront, not after disagreements arise, usually through a strong sense of ownership in the organization.

When identifying the decision maker remember that a local decision is faster and cheaper than a decision at a higher level, and you never want a team that is so helpless that to reach to the manager for every small decision.

Begin by separating “one way door” and “two way door” decisions. The former cannot, or may be very costly to “walk back” (e.g. adding a public API), while the latter can easily be undone (e.g. choosing a JSON parsing library). Critical, one way door decisions, will need more scrutiny, but two way door decisions can be made lower in the hierarchy. In both cases, look for the decision maker that is the true owner of the work in question. Who will be affected by this decision daily? Who is taking risks? Assuming we will have made the best decision, who will be celebrated as being right when the project is done? That should be the owner of the decision.

When making a significant one way door decision, look for a trusted tie breaker with authority in the hierarchy of the organization. At Amazon, Principal Engineers often step in as tie-breakers for any technical decision. For non-technical decisions use the organizational hierarchy to find the common denominator (e.g. a common manager). Be careful escalating the ask, though - the more senior the manager, the least context they have, and therefore are susceptible to the most articulate arguments (form vs. substance). In dysfunctional organization you will also find a lot of pass-through “leaders” who will refuse to make a decision and escalate to their manager, delaying the decision significantly. Watch a Director ask for more data, then question the existence of the whole project.

In general, I am of the opinion that a technical decision should almost never be made by a manager. A people manager carries organizational weight, often has veto power (can tell people what to do), usually has the least amount of detail regarding any technical decision no matter how technical they are, and will not be suffering the consequences of the decision on a daily basis.

Lay out the arguments for one decision vs. the other

Once the decision maker is identified, all parties must present their arguments clearly and respectfully. Good arguments include data, examples, risks, and benefits, and should be written down. The goal is to ensure the decision maker has all relevant information and perspectives before making their choice and to get on the same page (literally). This prevents decisions made in ignorance, and ensures that even unpopular choices are made with full awareness of the alternatives. Soliciting broad input at this stage creates more visibility and therefore is an opportunity to FYI the decision to your manager or a senior staff member. Don’t be scared, ask other people’s opinions now! Give everyone time to think about the problem and to sleep on it.

It’s critical that everyone can agree that each option is complete and viable, including yourself. Instead of thinking how you dislike one option (the cons), think in terms of its advantages and disadvantages (both pros and cons) and learn to articulate the pros, too. Finally, if you are the decision maker, avoid writing “recommended” next to any of the options until the last moment not to bias the group.

Let the decision maker decide

After all arguments are presented, the decision maker must be given the space and authority to make their choice without interference. This means no lobbying after the fact, no attempts to undermine the decision, and no passive-aggressive resistance.

Don’t lie by saying “I’d prefer the team to decide”, when, in fact, you’d prefer to decide, and don’t flex your decision making power, it just shows how insecure you are at wielding it. If you are a manager asked to make a decision in a room with subordinates, try saying “These are well laid arguments. I will let the team decide.” as much as possible, then side with the majority. And if you are the most senior member of the technical staff, never say “I am the Principal Engineer, therefore I decide” - everyone already knows it, and that just makes you look like a d*ck.

The decision maker should explain their reasoning, and document the decision. Once the choice is made, it becomes the team’s direction. This requires trust in the decision maker’s judgment and a commitment to respect their authority, even when the outcome differs from one’s personal preference.

Collectively commit to the outcome sought by the decision

Everyone, regardless of their initial position, must fully commit to implementing the chosen direction. This means actively working toward the success of the decision, not just grudgingly complying. You must be able to explain and advocate for the decision to others, as if it were your own choice. This collective commitment transforms a potentially divisive decision into a unified team effort, ensuring the organization moves forward together rather than being pulled apart by lingering disagreements.

If you disagreed with the option chosen, this is your time to earn trust and commit visibly. I’ve recently disagreed with a decision made by a Principal Engineer to remove an emergency status from a project that gave the team extraordinary authority to pull additional resources in. This was an easy “disagree and commit” for me. The reasons were clearly laid out and while I had my pros and cons for the option to keep the emergency status, those were no longer relevant after the decision was made. After the PE made the decision I wrote: “This is an easy disagree and commit for me, because I do agree that the project had achieved its original goals, and understand that the emergency status may no longer needed because problems should be resolved business as usual”. Since then, I have defended the decision like it was mine.

How to Disagree and Commit like Switzerland was originally published by Daniel Doubrovkine at code.dblock.org | tech blog on August 03, 2025.

2025-07-30T08:00:00+00:00

A famous example where telling Engineers what to do backfired was the Space Shuttle Challenger disaster in 1986. Engineers at Morton Thiokol, the contractor responsible for the shuttle’s solid rocket boosters, warned NASA management that the O-rings in the boosters could fail in cold weather. The night before launch, engineers strongly recommended delaying the launch due to unusually low temperatures. Management, under pressure to proceed, overruled the engineers’ concerns and told them to “make a recommendation based on data, not emotion.” Eventually, management told the engineers to sign off on the launch, despite their objections. The shuttle launched in cold weather, the O-rings failed, and the Challenger exploded, killing all seven astronauts on board.

This wasn’t a software problem, but plenty of software engineering disasters are documented. The Knight Capital Group trading loss (2012), where rushed deployment caused a $460M loss, the Ariane 5 rocket failure (1996), where reused code not designed for the new rocket led to its destruction, the Therac-25 radiation overdoses (1985–87), where ignoring software safety warnings resulted in patient deaths, and the Healthcare.gov launch (2013), where ignoring technical advice led to a high-profile, catastrophic rollout. In each case, sidelining engineering judgment in favor of business or schedule pressures led to major failures.

I bet you have your own disaster story. To quote you, “I told you so!”.

Yet, engineering managers continue telling engineers what to do every day. And not just engineers - all subordinates. Sometimes, it’s time pressure. More often it is because managers are also engineers, and occasionally more experienced, so we think we just know better. Do we?

I tell my direct reports that there’s nothing I can make them do, but that there may be real consequences. I once refused to do something highly problematic my manager asked me to do, and instead said I’ll think about it. It was a clever response, so I thought. They responded that my refusal will become a performance problem, almost hung my remote exception over this, and told me that I failed at “disagree and commit”. That’s about the moment I decided to leave that company.

Speaking of “disagree and commit”, telling people what to do, especially if they are in your direct reporting chain, must not be mistaken for it. A proper disagreement is usually amongst peers and begins with a careful understanding of multiple ways to do something, followed by different opinions on the option to take, and one person making the choice that others disagrees with. One must be able to articulate clearly the reasons for choosing their least favorable option, and be the messenger of that option moving forward, as if it were their preferred choice. A manager can be a tie breaker, but their power must be wielded with extreme caution.

For example, “disagree and commit” in the context of the Swiss government means that even when individual members of the Federal Council personally oppose a popular initiative, they publicly defend and implement it once it’s approved, acting as a unified body. This approach ensures stable governance and respect for democratic decisions, as officials set aside personal views to uphold the collective will and present a united front.

Everything else? That is just what it is, telling people what to do.

So, my advice is not to tell people what to do, unless they want you to, you are helping them move past a point of indecision, or you really strongly feel you must.

The first case is easy. Sometimes people want you to make a decision. For example, I ask others to tell me what to do because I think they are more competent than me. I always ask my lawyer what to do on legal matters, because I am not a lawyer. I like to provide input, but I will ultimately ask my manager to tell me what to do when it comes to organizational decisions that affect not just me, but also my peers. Of course, many people often would rather not make a decision because they don’t want to own it, to cover their ass. My usual response in those cases is that I’ll help you, but you decide. Note that the common denominator here is that the individual asks you to tell them what to do.

The second is also easy. Sometimes we waste a lot of time figuring out what to do. To quote Tobi at Shopify, “if you can’t decide, do what I say”. I kinda love it because it’s often helpful, and very startup like - we’d rather move fast more often than not.

The last case is much harder. Sometimes the decision is so bad, costly, or permanent, that you really must tell someone what to do. I tend to ask myself whether this decision is reversible or not (a 1-way door, or a 2-way door). For example, prioritizing a feature over another is not a 1-way door decision unless a customer is waiting on something by a deadline. Firing someone is literally a 1-way door. An API can be a 1-way door, but maybe we can find a compromise and label it experimental, turning a 1-way door decision into a 2-way door? Not doing something is usually a 2-way door.

Decisions of what to do belong to those doing the real work. Resist the urge to tell people what to do, and you’ll create a much more motivated team that feels like it truly owns all its decisions. A manager is always a bottleneck with more general context, but partial project information. Therefore, teams that can make decisions without consulting their manager will have better velocity, as they are not constantly waiting on a manager’s decision. Don’t tell people what to do, and they might just go a lot faster.

Don't Tell Engineers What to Do was originally published by Daniel Doubrovkine at code.dblock.org | tech blog on July 30, 2025.

2025-07-30T01:00:00+00:00

Let’s get Claude Code to connect to Google Sheets. We’ll use this Google Sample Spreadsheet.

Go to the Google Cloud Console, create a new project, and enable the Google Sheets API on it.

Install the Google Cloud CLI.

brew install google-cloud-sdk

Choose the newly created project on the command line.

~$ gcloud config set project gsheetscli

Updated property [core/project].

gcloud auth login --enable-gdrive-access

You can now access the spreadsheet with curl.

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  "https://sheets.googleapis.com/v4/spreadsheets/1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms/values/Class%20Data"

{
  "range": "'Class Data'!A1:V101",
  "majorDimension": "ROWS",
  "values": [
    [
      "Student Name",
      "Gender",
      "Class Level",
      "Home State",
      "Major",
      "Extracurricular Activity"
    ],
    ...
  ]
}

Let’s ask Claude to do something useful.

~$ claude

╭───────────────────────────────────────────────────╮
│ ✻ Welcome to Claude Code!                         │
╰───────────────────────────────────────────────────╯

> analyze the distribution of extracurricular activities in https://docs.google.com/spreadsheets/d/1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms/edit?gid=0#gid=0

⏺ I'll analyze the distribution of extracurricular activities from the Google Sheets document you provided.

⏺ Fetch(https://docs.google.com/spreadsheets/d/1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms/edit?gid=0#gid=0)
  ⎿  Received 193.5KB (200 OK)

⏺ The extracurricular activities distribution shows perfect balance:

  Activity Distribution:
  - Drama Club: 5 students (16.7%)
  - Basketball: 5 students (16.7%)
  - Debate: 5 students (16.7%)
  - Lacrosse: 5 students (16.7%)
  - Baseball: 5 students (16.7%)
  - Track & Field: 5 students (16.7%)

  Total: 30 students across 6 activities

  Each activity has exactly 5 students participating, resulting in an even distribution with no activity being more popular than others in this dataset.

Whoa.

Troubleshooting

You need to gcloud auth login, otherwise you get a “Method doesn’t allow unregistered callers” error.

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  "https://sheets.googleapis.com/v4/spreadsheets/1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms/values/Class%20Data"

{
  "error": {
    "code": 403,
    "message": "Method doesn't allow unregistered callers (callers without established identity). Please use API Key or other form of API consumer identity to call this API.",
    "status": "PERMISSION_DENIED"
  }
}

You need a project that has access to Google Sheets. Otherwise you get a “Request had insufficient authentication scopes.” error.

{
  "error": {
    "code": 403,
    "message": "Request had insufficient authentication scopes.",
    "status": "PERMISSION_DENIED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.ErrorInfo",
        "reason": "ACCESS_TOKEN_SCOPE_INSUFFICIENT",
        "domain": "googleapis.com",
        "metadata": {
          "method": "google.apps.sheets.v4.SpreadsheetsService.GetValues",
          "service": "sheets.googleapis.com"
        }
      }
    ]
  }
}

Select the project with gcloud config set project [name], enable access to Google Sheets with gcloud services enable sheets.googleapis.com and login again with gcloud auth login --enable-gdrive-access.

Using Claude Code with Google Sheets was originally published by Daniel Doubrovkine at code.dblock.org | tech blog on July 30, 2025.

2025-06-21T01:00:00+00:00

Six months ago I failed a basic coding interview at a FAANG. Yes, I was a Principal Engineer, and yes, I was paid absurd amounts of money, yet I couldn’t implement a diameter of a binary tree as a “warm up” exercise, 10 lines of code.

The interviewer was very nice about it, and was equally surprised. You see, I’ve been coding for 35 years, and I am not “rusty” at it, which is a typical excuse for senior ICs bombing LeetCode interviews. At that time I was writing code every day. I simply froze, and couldn’t do it. All I could think of was “why the hell am I here doing this to myself?”. I couldn’t turn my brain around, apologized, cut the interview short, and, while I did ace several coding interviews at other companies for IC roles, went back to management for which I am, obviously, more qualified.

Did you just bomb a coding interview? Here’s some practical advice of how to deal with it.

First, keep your head high and do not let this negative experience get to you. It feels awful, but unless you are blogging about your failure to implement the diameter of a binary tree, nobody other than you and the interviewer will ever know. Most likely the interviewer was also a half asleep drone deep inside the borg, administering a coding puzzle every day to the next sucker, so they will forget about you the next day.

I see a lot of junior Engineers affected by failing at these. If you are recently out of college, it’s helpful to think of these interviews as a school test, such as an SAT. I never had to take SATs (my children did), but most European schools have oral exams where you get a ticket from a jar with the problem statement. I had slept through too many alarms to attend an 8 AM physics class around 1996 because of a grueling late night partying schedule, and only really knew maybe half the material. Luckily 2/3 questions in my ticket overlapped with those classes I did attend, and so I did well.

If you don’t like school and prefer video games, think of these interviews as Crossy Road. You need some skill, but mostly you just need some luck. There’s a set of problems you know how to solve, and a set you don’t. Practicing is important, but nothing helps like the fact that I was asked to implement an LRU Cache twice in the last decade in coding interviews. You also need to only cross one road. Failed? Try to cross half a mile down.

If you are a junior IC, remember that these LeetCode-style interviews are dumb and there’s nothing you can do about it. Fifteen years ago I wrote a viral blog post entitled GitHub is Your New Resume. Nothing changed since. In hiring from open-source at my last job at AWS I tried to swap the coding interview requirement for our team’s hiring for an offline evaluation of one’s open-source code. Because, you know, these are active open-source contributors and we are sure they can code! Naturally, I failed to change anything. I was told that we needed a consistent interviewing process to evaluate everyone the same way. And so we consistently hired people that could code on a whiteboard, a complete self-own.

If you are a senior IC, remember that you are not paid to write diameters of binary trees. You don’t really want to work at the company that gives you that type of problems at the interview. The job of a Principal Engineer is to have resounding impact on a time scale of ~1 year. You pick the most important unsolved problem and you solve it, long term. This may include prototyping a meaningful piece of software that changes the course of a business, or gaining support from senior leadership for a directional alignment for hundreds of people. It could be enabling teams in ways they couldn’t think of, and so many other ways to have the kind of impact that people with decades of experience tend to occasionally produce. None of it is the kind of code ~~a junior Engineer~~ an AI can crank out in seconds in 2025.

Speaking of AI, we have one chance to get rid of these stupid interviews now. It has become very clear that only companies that embrace AI will survive, so the best way to evaluate a candidate is to let them use AI, which gets rid of having to implement solutions to these problems in coding interviews.

Good luck with your interviews!

I Failed to Implement the Diameter of a Binary Tree in a Coding Interview was originally published by Daniel Doubrovkine at code.dblock.org | tech blog on June 21, 2025.

2025-06-21T01:00:00+00:00

One of my colleagues wrote a pretty awesome tool called claude-swarm that orchestrates multiple Claude Code instances as a collaborative AI development team. At Shopify, we are attempting to use it to generate Ruby unit tests at some scale with an army of AI test agents (think a “Ruby Expert” paired with a “TDD Practitioner” and a “Code Review Nitpicker”). But for the purposes of this post, let’s just upgrade Ruby in a few projects.

First, ensure that you have a working version of command-line Claude code with a monthly subscription, since you will be having a lot of tokens for breakfast.

$ claude "Say hello."
╭───────────────────────────────────────────────────╮
│ ✻ Welcome to Claude Code!                         │
╰───────────────────────────────────────────────────╯

> Say hello.

⏺ Hello! I'm Claude Code, ready to help you with your software engineering tasks. What can I assist you with today?

Install the claude_swarm gem. Note that the gem name has an _.

$ gem install claude_swarm

$ claude-swarm --help

Commands:
  claude-swarm clean                 # Remove stale session symlinks
  claude-swarm generate              # Launch Claude to help generate a swarm configuration interactively
  claude-swarm help [COMMAND]        # Describe available commands or one specific command
  claude-swarm init                  # Initialize a new claude-swarm.yml configuration file
  ...

Create a claude-swarm.yml.

version: 1
swarm:
  name: "Ruby Upgrade Swarm"
  main: lead_developer
  before:
    - "echo 'Getting started ...'"
  instances:
    lead_developer:
      description: "Lead developer to coordinate Ruby upgrades."
      directory: ..
      model: opus
      allowed_tools:
        - Bash
      prompt: |
        Find the Ruby projects in this directory.
        Do not recurse further than 3 levels deep.
        For each project, identify the Ruby version used.
        Print the project name next to the ruby version only.
        For the first 5 projects with a Ruby version older than 3.5.5 use ruby_upgrader to upgrade the Ruby version.
      connections: [ruby_upgrader]

    ruby_upgrader:
      description: "Upgrades a Ruby project."
      model: opus
      allowed_tools:
        - Bash
      prompt: |
        1. Upgrade the version of Ruby in this project to 3.5.5. You can `rvm use 3.5.5` to switch to this Ruby version.
        2. Ensure that `bundle install` works.
        3. Print the current project directory name and a confirmation that you upgraded the version.

My prompts are very sloppy, but YOLO.

Run it.

$ claude-swarm

Starting Claude Swarm from claude-swarm.yml...
🐝 Starting Claude Swarm: Ruby Upgrade Swarm

✓ Generated MCP configurations in session directory

⚙️  Executing before commands...

✓ Before commands completed successfully

🚀 Launching main instance: lead_developer
   Model: opus
   Directory: /Users/dblock/source/...
   Allowed tools: Bash
   Connections: ruby_upgrader

╭───────────────────────────────────────────────────╮
│ ✻ Welcome to Claude Code!                         │
╰───────────────────────────────────────────────────╯

> Find the Ruby projects in this directory.
  Do not recurse further than 3 levels deep.
  For each project, identify the Ruby version used.
  Print the project name next to the ruby version only.
  For the first 5 projects with a Ruby version older than 3.5.5 use ruby_upgrader to upgrade the Ruby version.

  Now just say 'I am ready to start'

⏺ I am ready to start

⏺ Update Todos
  ⎿  ☐ Find Ruby projects in directory (max 3 levels deep)           
     ☐ Identify Ruby version for each project
     ☐ Upgrade first 5 projects with Ruby < 3.5.5 using ruby_upgrader
     ☐ Print project names with Ruby versions

⏺ Update Todos
  ⎿  ☐ Find Ruby projects in directory (max 3 levels deep)           
     ☐ Identify Ruby version for each project
     ☐ Upgrade first 5 projects with Ruby < 3.5.5 using ruby_upgrader
     ☐ Print project names with Ruby versions

⏺ Task(Find Ruby projects)
  ⎿  Done (21 tool uses · 38.0k tokens · 2m 51 5s)                                                                                                                                                               

⏺ Update Todos
  ⎿  ☒ Find Ruby projects in directory (max 3 levels deep)           
     ☐ Identify Ruby version for each project
     ☐ Upgrade first 5 projects with Ruby < 3.5.5 using ruby_upgrader
     ☐ Print project names with Ruby versions

⏺ Based on the comprehensive analysis, here are the Ruby projects with their versions:

  Projects with Ruby < 3.5.5:
  - slack-amber-alert: 2.6.2
  - slack-api-explorer: 2.7.7
  - slack-iad: 2.6.5
  - slack-market: 2.7.7
  - slack-ruby-client (3 forks): 2.7.6
  - slack-ruby-bot-server-sample (dblock): 2.6.6
  - slack-ruby-bot-server-sample (slack-ruby): 2.6.2
  - slack-ruby-bot-server-stripe-sample: 2.6.6
  - slack-ruby-bot-server-events-app-mentions-sample: 2.6.6
  - slack-ruby-bot-server-events-sample (2 forks): 3.1.1
  - slack-app-error-showcase: 3.1.1
  - slack-rails-bot-starter: 2.7.1
  - slack-shellbot (slack-ruby): 2.7.7

  Projects with Ruby >= 3.5.5:
  - slack-arena: 3.3.5
  - slack-gamebot: 3.3.5
  - slack-gamebot2: 3.3.5
  - slack-invite: 3.3.5
  - slack-moji: 3.3.5
  - slack-strava: 3.3.5
  - slack-sup: 3.3.5
  - slack-sup2: 3.3.5
  - slack-shellbot (dblock): 3.3.5

  Projects without specified versions:
  - 25 gem library projects (slack-ruby-bot, slack-ruby-bot-server variants, etc.)

⏺ Update Todos
  ⎿  ☒ Find Ruby projects in directory (max 3 levels deep)           
     ☒ Identify Ruby version for each project
     ☒ Print project names with Ruby versions
     ☐ Upgrade first 5 projects with Ruby < 3.5.5 using ruby_upgrader

⏺ Now upgrading the first 5 projects with Ruby < 3.5.5:

⏺ ruby_upgrader:task (MCP)

⏺ Read(slack-amber-alert/Gemfile)
  ⎿  Read 38 lines (ctrl+r to expand)

⏺ Update(slack-amber-alert/Gemfile)
  ⎿  Updated slack-amber-alert/Gemfile with 1 addition and 1 removal
       1    source 'http://rubygems.org'
       2    
       3 -  ruby '2.6.2'
       3 +  ruby '3.3.5'
       4    
       5    gem 'grape'
       6    gem 'hashie'

⏺ Read(slack-api-explorer/Gemfile)
  ⎿  Read 43 lines (ctrl+r to expand)

⏺ Update(slack-api-explorer/Gemfile)
  ⎿  Updated slack-api-explorer/Gemfile with 1 addition and 1 removal
       1    source 'http://rubygems.org'
       2    
       3 -  ruby '2.7.7'
       3 +  ruby '3.3.5'
       4    
       5    gem 'giphy'
       6    gem 'irb'

⏺ Read(slack-iad/Gemfile)
  ⎿  Read 47 lines (ctrl+r to expand)

⏺ Update(slack-iad/Gemfile)
  ⎿  Updated slack-iad/Gemfile with 1 addition and 1 removal
       1    source 'http://rubygems.org'
       2    
       3 -  ruby '2.6.5'
       3 +  ruby '3.3.5'
       4    
       5    gem 'grape'
       6    gem 'hashie'

⏺ Read(slack-market/Gemfile)
  ⎿  Read 53 lines (ctrl+r to expand)

⏺ Update(slack-market/Gemfile)
  ⎿  Updated slack-market/Gemfile with 1 addition and 1 removal
       1    source 'http://rubygems.org'
       2    
       3 -  ruby '2.7.7'
       3 +  ruby '3.3.5'
       4    
       5    gem 'cachy'
       6    gem 'gruff'

⏺ Read(slack-ruby-client/dblock/.ruby-version)
  ⎿  Read 2 lines (ctrl+r to expand)

⏺ Update(slack-ruby-client/dblock/.ruby-version)
  ⎿  Updated slack-ruby-client/dblock/.ruby-version with 1 addition and 1 removal
       1 -  2.7.6
       1 +  3.3.5

⏺ Update Todos
  ⎿  ☒ Find Ruby projects in directory (max 3 levels deep)           
     ☒ Identify Ruby version for each project
     ☒ Upgrade first 5 projects with Ruby < 3.5.5 using ruby_upgrader
     ☒ Print project names with Ruby versions

⏺ Completed upgrading the first 5 projects with Ruby < 3.5.5:

  1. slack-amber-alert: 2.6.2 → 3.3.5
  2. slack-api-explorer: 2.7.7 → 3.3.5
  3. slack-iad: 2.6.5 → 3.3.5
  4. slack-market: 2.7.7 → 3.3.5
  5. slack-ruby-client (dblock): 2.7.6 → 3.3.5

  All projects have been successfully upgraded to Ruby 3.3.5.

  → MCP server mcp_ruby_upgrader (PID: 12294) already terminated
✓ Cleanup complete

You get the idea. Who is building a registry for reusable AI specialists?

Honestly, wow.

Using Claude-Swarm to Upgrade Ruby Projects was originally published by Daniel Doubrovkine at code.dblock.org | tech blog on June 21, 2025.

2025-05-18T00:00:00+00:00

There’s much written about becoming a first time Engineering Manager (I recommend Camille’s book), but little about the second time.

To my own surprise, I am back to being a manager after almost 6 years of writing code as an individual contributor (IC). Two months ago I joined the Developer Productivity organization at Shopify to create a new team called “Augmented Engineering”. We’ve been busy building, and have recently released an open-source tool to execute structured A.I. workflows, called Roast. This post is not about A.I., so let’s get back to talking about my motivations, and the pros and cons of going back from IC to Manager.

The Manager Path

My manager path was a typical one. I was a strong individual contributor (or so I was told), which led to an increase in team coordination responsibilities, culminating in a promotion to Team Lead about 25 years ago. My very first manager was my best manager and a fantastic role model - I had a niche product idea that he noticed and helped get funded, which had me directly managing people, being responsible for their careers and compensation. My scope then grew to Director (a manager of managers), then Head of Engineering, and finally CTO a decade ago. I have, of course, burned out, and decided it was healthier to get paid more and have less stress, and so I went back to being an individual contributor, joined AWS in 2019, and wrote about the pros and cons of going from management back to IC. I then told myself that I would never go back to management.

Until now.

Managerial Dysfunction

Over the past 6 years, I have been exposed to hundreds of teams and witnessed many struggle with poor management. Working across many teams was in the job description, and I enjoyed offering my time to dozens of people in 1:1s. Not unexpectedly, many sought career advice from me - I was a peer, yet a former CTO. I asked - “Are you not getting career advice from your manager?”. Some were, but many more complained about being treated as just another number, stuck in the middle of organizational dysfunction, mostly ignored by their management chain. They felt like “resources” and were called that name. The depth of the pain was surprising. Hearing “My manager sucks.” was common. In debugging this problem, I found out that the unhappy ones were almost always forcefully re-organized under their current manager, as opposed to being actively sought and hired by them. The common denominator was that they never chose their manager - the latter was either hired or inserted above their head, replacing their original hiring manager. In my own experience, while I did not necessarily feel the same way, my last manager was hired and inserted above me, and I had not interviewed them, not given a choice, never asked a question about the individual being brought in, nor had any say in their hiring before that decision was made.

Sometimes one gets lucky, but most of the time one does not. Twice a week over a period of several years I had an IC get caught in a kerfuffle involving some team dynamic or company-wide decision and get screwed over, often by their manager, pouring their grievances to me in a 1:1, unable to openly criticize their manager, which would be career suicide. In Russian we say that “criticizing management is like peeing against the wind”.

In a recent discussion with an Engineer I heard this (paraphrased): “I really wanted to finish my project, but in the last team meeting a colleague of mine started swearing, which they never did, so much they were frustrated. I got another job offer and told the recruiter to make my start date as early as possible.” This person had 3 managers in less than a year. Each was a downgrade from the previous one, the latter failing to understand technology being built in any level of depth, and making up unreasonable technical asks and timelines. The least capable managers have not done any IC work for years, are rarely able to grasp the depth of the technical problems being solved, yet make plenty of decisions around them. They can’t find another job, and spend all their energy playing politics, managing up, and torturing their directs.

The Return to Office Debacle

The second issue was the Return to Office (RTO) mandate. I have not had a 1:1 with anyone in the last 2 years where this wasn’t the main topic of conversation. What a waste of time and energy! The desire to bring people back to an office was reasonable, but the way it was implemented was a huge mistake and was not customer obsessed. Instead of doing it “by example”, where the managers would build real opportunities to collaborate in-person, people were told to badge like children at school. Remote managers were written off, everybody knew their headcount will not grow, and attrition was not backfilled remotely. As a consequence, some of the best people were lost. Remote team sizes dwindled down through death by a thousand cuts, then the remainder were forced to move to a hub, or find another job. Commutes became unbearable in large cities, especially painful for my colleagues in India or the Pacific Northwest. People would spend their days on video conferences from the office working across the continents, adding to the frustration. Ironically, I had a remote exception, all while living in the middle of Manhattan, a five minute walk away from my old office where I had no team. Naturally, every junior IC was envious of my situation and found it completely unfair, which it was. “What do you care? You have a remote exception.”, two senior managers told me. Talk about empathy!

My Decision to Return

To conclude, I had finally realized that my people skills were desperately needed while everyone was coming down from the post-pandemic roller coaster. And so, I became a manager again. I think I am capable of building great teams, as opposed to watching existing good ones burn out.

Pros and Cons of Being a Manager (Again)

The cons of being a manager are obvious. It’s a lot more stressful. My team does all the work and I am not in full control of my outcomes, yet responsible for all the failures. The pros sum up to feeling more like a real human, seeing people grow, enjoying working on technology I care about, and witnessing success when enabling a team that does their best work.

Increased Multitasking and Stress

This is one of the biggest cons. I do feel more stress as a manager because of the constant context switching, direct reports that routinely vent their personal life and frustrations with me, and exposure to more gossip and politics. Today I have 3 teams doing very different things at different stages of assembly, and have to stash and restore context in and out of my frontal lobe multiple times a day all while facing demanding senior leadership, and a lot of incoming pressures from multiple directions.

To mitigate this I try not to be anyone’s psychiatrist, and spend more time being the face of the organization. I force myself to work normal hours, protect time to focus on tasks that require flow, and force myself to write some code, and do other “real” work with my own hands. I also take ownership of problems visibly, and accept that being responsible for my team’s failures is simply my job.

Feeling Good About People Work

Whereas I had lots of 1:1s with my fellow Engineers, I was mostly tasked with producing software as an IC. As a manager, I am back to having to listen to people as part of my role. This forces me to take my 1:1s a lot more methodically, to followup on people problems, resolve conflicts, inspire each individual to be their best at work, and enable them in any way I can. As a consequence, it feels great to see everyone on a a team I built work well together and be inspired.

Working with People Rather than Computers

I still enjoy writing code after 35+ years of it, but I am learning less now, even though technology moves so fast and everything feels new again with A.I. A lot of the craft for an experienced Engineer can become a little boring, and I am not as motivated as someone who is in the middle of learning the exciting parts of it. While on the other hand, the younger people today are quite different from me, and surprise me often in more ways than none. Working with people as my primary job can be interesting in many new ways.

Feeling Control

As a senior manager I set direction and have significant amounts of organizational power. I know where the bodies are being buried because I hold the shovel. I make many decisions that affect others. I have access to detailed information, know early when someone important is quitting or being fired. I have a word in hiring, letting go and promoting. This comes with agonizing headaches, such as not being able to tell someone that they are about to lose their job, but there’s a better sense of control, and maybe even some level of psychological safety for myself. This is quite personal, but I do feel better when I sense I have a good grip on things.

The Future of Empire Building

I don’t know how to conclude this post, so I will only say that now that I am a manager again, I might as well build an empire. I could divide and conquer my team into people I like, and people I don’t like, and terrorize the former. Of course, this is a joke, but I like to think about this empire-building concept that is often associated with senior managers as a signal. The moment I feel that I am, or have built an empire, is the moment to become an IC again, with its own pros and cons.

Pros and Cons of Going from Individual Contributor Back to Manager was originally published by Daniel Doubrovkine at code.dblock.org | tech blog on May 18, 2025.

2025-05-10T01:00:00+00:00

Last week, my team at Shopify released a new tool called Roast, a convention-oriented framework for creating structured AI workflows. Our vision is to enable A.I. agents to solve developer productivity problems at scale. Imagine continuously improving unit tests, or optimizing test performance across hundreds of thousands of tests with minimal human intervention!

As we embarked on this journey, we quickly learned that, like many junior Engineers, A.I. needs some help focusing, and may not be ready to roam free around millions of lines of code and not hallucinate random results. So we made Roast, which gave A.I. some structure.

In the following example I will grade a Ruby unit test.

If you don’t have one already, get an OpenAI key from here. You will need an account with a credit card, make sure that a basic completion works.

export OPENAI_API_KEY=sk-proj-....

curl -H "Content-Type: application/json" \
    -H "Authorization: Bearer $API_TOKEN" \
    -d '{"model":"gpt-4.1-mini","messages":[{"role":"user","content":"What is 1+1?"}]}' \
    https://api.openai.com/v1/chat/completions

This should produce a chat.completion response.

{
  "id": "chatcmpl-BVdeKT88rtgxDCId9jFIpCr5sJQKv",
  "object": "chat.completion",
  "created": 1746880280,
  "model": "gpt-4.1-mini-2025-04-14",
  ...
}

Check out Roast. You can gem install roast-ai from Rubygems, but I am going to reuse an example workflow, so git clone git@github.com:Shopify/roast.git and cd roast.

The test grading workflow is a senior software engineer and testing expert that evaluates the quality of a test based on guidelines. It’s configured to use gpt-4.1-mini to analyze coverage, but o3, a more expensive and capable model, to generate recommendations. The free OpenAI tier doesn’t have access to these, so I have edited the file to use gpt-4.1-mini everywhere (model: gpt-4.1-mini). I also told it to use my OpenAI token from OPENAI_API_KEY (api_token: $(echo $OPENAI_API_KEY)).

diff --git a/examples/grading/workflow.yml b/examples/grading/workflow.yml
index d62a3d0..a10c7bc 100644
--- a/examples/grading/workflow.yml
+++ b/examples/grading/workflow.yml
@@ -1,5 +1,8 @@
 name: Grading current test changes
 
+api_token: $(echo $OPENAI_API_KEY)
+model: gpt-4.1-mini
+
 tools:
   - Roast::Tools::Grep
   - Roast::Tools::ReadFile
@@ -21,16 +24,13 @@ steps:
 
 # set non-default attributes for steps below
 analyze_coverage:
-  model: gpt-4.1-mini
   auto_loop: false
   json: true
   
 generate_grades:
-  model: o3
   json: true
 
 generate_recommendations:
-  model: o3
   auto_loop: false
   json: true
   params:

You may also need shadowenv and rg, so brew install shadowenv and brew install rg.

Let’s run Roast on one of its own tests!

~/source/shopify/roast (main)$ ./exe/roast execute examples/grading/workflow.yml test/roast/resources_test.rb 
🔥🔥🔥 Everyone loves a good roast 🔥🔥🔥

Configuring API client with token from workflow
Starting workflow...
Workflow: roast/examples/grading/workflow.yml
Options: {}
Running workflow for file: test/roast/resources_test.rb
Executing: read_dependencies (Resource type: none)
🔍 Searching for file: resources.rb
📖 Reading file: lib/roast/resources.rb
Executing: run_coverage (Resource type: none)
Requiring step file: roast/examples/grading/run_coverage.rb
Run options: --seed 19099

# Running:

.............

Finished in 0.000992s, 13104.8373 runs/s, 16129.0306 assertions/s.

13 runs, 16 assertions, 0 failures, 0 errors, 0 skips
Executing: analyze_coverage (Resource type: none)
Executing: verify_test_helpers (Resource type: none)
Executing: verify_mocks_and_stubs (Resource type: none)
🔍 Grepping for string: def 
Executing: generate_grades (Resource type: none)
Executing: calculate_final_grade (Resource type: none)
Requiring step file: roast/examples/grading/calculate_final_grade.rb
Executing: format_result (Resource type: none)
Requiring step file: roast/examples/grading/format_result.rb
Executing: generate_recommendations (Resource type: none)
🔥🔥🔥 ROAST COMPLETE! 🔥🔥🔥
Final output saved to: roast/.roast/sessions/grading_current_test_changes/resources_test_rb_13224ed7/20250510_122040_774/final_output.txt
Final output saved to: roast/.roast/sessions/grading_current_test_changes/resources_test_rb_13224ed7/20250510_122040_774/final_output.txt

The test grade report is quite impressive.

========== TEST GRADE REPORT ==========
Test file: test/roast/resources_test.rb

FINAL GRADE:
  Score: 80/100
  Letter Grade: B 

RUBRIC SCORES:

  Line Coverage (10% of grade):

    Value: 9.0

    Score: 9/10 - "The source file has 82.93% line coverage, which is good coverage with only minor or edge case code paths remaining untested."

  Method Coverage (10% of grade):

    Value: 10.0

    Score: 10/10 - "The source file has 100% method coverage, indicating all methods are being tested."

  Branch Coverage (30% of grade):

    Value: 6.0

    Score: 6/10 - "The source file has 62.96% branch coverage, indicating some conditions lack testing for all outcomes, especially in conditional branches related to resource type detection."

  Test Helpers Usage (10% of grade):

    Value: 10.0

    Score: 10/10 - "No custom test helpers are used in the test file, so there are no issues related to helper usage."

  Mocks and Stubs Usage (10% of grade):

    Value: 10.0

    Score: 10/10 - "No mocks or stubs are used in the test file, so there are no issues related to mocking or stubbing."

  Test Readability (10% of grade):

    Value: 8.0

    Score: 8/10 - "Test names are descriptive and clearly communicate the expected behavior, though the structure could be improved with more context blocks or grouping."

  Test Maintainability (10% of grade):

    Value: 8.0

    Score: 8/10 - "Tests focus on behavior and public interfaces without coupling to implementation details, making them reasonably maintainable."

  Test Effectiveness (10% of grade):

    Value: 7.0

    Score: 7/10 - "Tests cover the main expected behaviors and resource types, but some edge cases and error conditions remain untested, as indicated by uncovered branches."

Using more expensive models takes a little longer to run and I get a “C” grade instead of a “B”. Static analysis, such as branch coverage, gets the same score, while test effectiveness goes from 7 to 6. The better model finds untested edge cases that may cause undetected regressions. Compare “Tests cover the main expected behaviors and resource types, but some edge cases and branches remain untested, limiting full validation of all possible inputs.” to “Core behaviors (file, directory, url, api, none detection) are covered, but several branches - such as command targets and glob-pattern edge cases - remain untested, leaving room for undetected regressions.”.

Honestly, wow.

Executing Structured A.I. Workflows with Shopify Roast was originally published by Daniel Doubrovkine at code.dblock.org | tech blog on May 10, 2025.

2025-04-05T01:00:00+00:00

You may have landed here because I had just apologized for my obsessiveness over punctuation in a GitHub comment and requested changes to your pull request, or you noticed me adding periods to a sentence in the otherwise excellent document you have authored. This is not the first time, and my condition manifests itself in some interesting ways.

Consider the following, seemingly well-formed sentence.

This is what I actually see.

This must be, obviously, a you problem! How can you fix it? Add a period to this complete sentence.

Quite seriously, the gist of the issue is that I am simply not capable of reading text that is not properly punctuated. It’s not you, it’s me. Please add a period.

My rule of thumb for dealing with this condition is that a complete sentence should end with a period. Don’t add a period to “the quick brown fox jumps over the lazy dog” or “The Quick Brown Fox Jumps Over The Lazy Dog”.

Consistently punctuate list items, such as either all items are full, capitalized sentences, or none of them are. The following are good examples.

1. One quick brown fox jumps over the lazy dog.
2. Two quick brown foxes jump over the lazy dog.

- one quick brown fox jumps over the lazy dog
- two quick brown foxes jump over the lazy dog

After a very deep breath I may be able to tolerate a period at the end of a list, as long as the items are capitalized consistently, but this is not recommended. Periods are cheap.

One quick brown fox jumps over the lazy dog
Two quick brown foxes jump over the lazy dog
Three quick brown foxes jump over the lazy dog.

There’s “attention to detail” and there’s “obsessing over periods”. I am sorry for my obsession, but I do appreciate your attention to detail.

While you are here, you should also know that I am a compulsive sorter. For example, I organize my personal documents and photos by year, going back to 1990. Each year has a subfolder for each of the months.

There’s more.

In 2016 I had to invent a structure for my local git clones after having an episode. I have hundreds of labels to sort through GitHub notifications.

The posts in this blog are similarly organized, and the fact that there’s both a _posts and a posts folder in Jekyll is currently very triggering. I am thrown off by lines wrapping in markdown, and I wrote a Danger plugin to ensure consistent sentence formatting in a project’s CHANGELOG.md.

This condition spills into real life, as I obsessively classify the smallest of my expenses into dozens of custom categories in the personal finance app Monarch, including “Electric Scooter”, “Kids Lunch” and “Kitchen Utensils”.

While writing this post, I also realized that I am not as triggered by lack of order or punctuation in books. I quite enjoy the chaos of William S. Burroughs’ “Naked Lunch”. I wonder what it is about computers that causes this in me?

Apologizing for My Obsessiveness Over Punctuation was originally published by Daniel Doubrovkine at code.dblock.org | tech blog on April 05, 2025.

2025-02-10T01:00:00+00:00

Friday, February 14th is my last day at Amazon.

I’ve been at Amazon for over 5½ half years, and it’s been a great run. I came from 8 years of leading technology at Artsy, and wanted to go back to coding. I found a Principal Engineer role to help launch the new AWS Data Exchange service. I was excited to work on a marketplace (Artsy is a marketplace for fine art), and the idea of a service that could connect data providers and data consumers at Amazon scale was big. Plus, I was finally going to learn how Amazon, and especially AWS, was so relentlessly successful, from within.

I half-jokingly talked of this as semi-retirement, because being an individual contributor at a F.A.A.N.G. seemed a lot easier than being CTO of a company that had grown from seed stage to Series D, raising $100MM. It was certainly different. Six months into my IC role the service had launched and I felt that I had made the right decision. I didn’t write enough code though, and was missing working on open-source (Artsy was open-source by default), so I moved to OpenSearch, a fork of Elasticsearch sponsored by Amazon, in 2021. That year was incredibly exciting work, getting the product off the ground, followed by steady growth, culminating with OpenSearch Project joining the Linux Foundation in 2024. In thinking about what to do next, I was not sure I could beat helping convince a 2.5 trillion dollar company to release control of a key project to a non-profit.

What have I learned over the years? If anything, it’s that Amazon is a unique company that is capable of changing its mind, but it’s not for the fainthearted. If you are considering a job there, I recommend it.

Allow me to share a few thoughts from my tenure.

Nobody is irreplaceable, but people make all the difference.
A successful business is a simple one, and does what customers are asking, and willing to pay for.
Writing well is a superpower, and a useful tool for influencing people.
If you are passionate about what you’re building, it will not feel like work.
Most people will recognize and appreciate your passion, and maybe even give you the benefit of the doubt.
There is a time and place where criticism can bring value, be sure you know your intention or value before giving it, or risk losing trust.

Finally, I’m forever grateful for my colleagues, the company leaders, and the Amazon PE community.

So, what’s next? We shall see! My criteria for a new role are as follows.

I want to work for a company that has a high NPS score of both employee and customer happiness.
I want to continue increasing my time living outside of NYC, especially in Latin or South America, and therefore be able to do much of my focused work remotely.

I recognize that I am privileged and extremely lucky to have opportunities where both of these can be met.

Onward!

Leaving Amazon was originally published by Daniel Doubrovkine at code.dblock.org | tech blog on February 10, 2025.

2025-01-20T01:00:00+00:00

Should you work on week-ends?

tl;dr Only 2% of my paid work was accomplished on weekends, which matches my opinion that you should not.

Collecting Data

First, let’s try to use data and find out whether I actually work on week-ends. I’ve contributed to the open-source OpenSearch since 2021 as part of my day job, so I tried looking through my GitHub commits, but had hard time excluding forks. I settled on fetching pull requests. The following query collects my PRs for the first two weeks of 2025.

gh search prs \
  --author "@me" \
  --created "2025-01-01..2025-01-20" \
  --json repository,number,title,createdAt \
  --template '{{range .}}{{tablerow
    (timefmt "Monday" .createdAt)
    .repository.nameWithOwner 
    (printf "#%v" .number) 
    (truncate 20 .title)
    (timeago .createdAt)
    }}{{end}}'

Sunday     dblock/discord-strava                            #32   Lock setup-firefox.   1 day ago
Friday     dblock/discord-strava                            #30   Upgrade Firefox i...  3 days ago
Tuesday    opensearch-project/opensearch-api-specification  #793  Warn if file path...  6 days ago
Friday     opensearch-project/opensearch-php                #253  Added optional he...  10 days ago
Thursday   opensearch-project/opensearch-php                #249  Adding @kimpepper...  12 days ago
Wednesday  dblock/code.dblock.org                           #139  Meeting Open Sour...  12 days ago
Wednesday  opensearch-project/opensearch-php                #247  Fix up UPGRADING.md.  12 days ago
Tuesday    opensearch-project/opensearch-api-specification  #772  Evaluate payload ...  13 days ago
Monday     opensearch-project/opensearch-api-specification  #768  Fix flaky snapsho...  14 days ago
Monday     opensearch-project/opensearch-api-specification  #767  Upload test spec ...  15 days ago
Wednesday  dblock/code.dblock.org                           #137  Only vale modifie...  19 days ago

Repeating this pattern using different date ranges and a larger --limit I found 1,566 PRs, including 1,264 PRs into the opensearch-project GitHub organization since April 2021, the time of the OpenSearch fork. Excluding PRs made on Saturday and Sunday I got 1,234. It’s a suspicious number (1, 2, 3, 4, really?), but it checks out. The data says that I made 30 PRs on week-ends for work, and 225 PRs not for work. Finally, these exclude the many commits I made into my pet projects directly to main.

We can conclude that only 2% of my paid work was accomplished on weekends.

Which Day is More Productive?

Here’s the distribution of PRs into opensearch-project (aka work) by day of week.

Day	Count
Monday	221
Tuesday	331
Wednesday	193
Thursday	321
Friday	168
Saturday	17
Sunday	13

It looks like Tuesdays and Thursdays are the busiest with fewer results on Friday, which is consistent with my opinion about the benefits of finishing earlier or doing other types of work than code on Fridays. Work is not just pull requests, it’s also attending meetings, and I participated in 2882 meetings in 5 years.

Should You Work on Weekends?

In my opinion, an experienced Engineer or a People Manager working on weekends shows a bad example. It sets an unreasonable expectation that everyone must be available 7 days a week for productive work, reduces the overall velocity of a team by wearing people out, and diminishes the quality of the code we collectively produce. Us, humans, need to be able to step back away from the machine to recharge, and we need time to think, therefore working harder or more hours is usually not correlated with better results.

What About Looming Deadlines?

Over the years I observed that developers that bang out code on a Sunday tend to bang out a lot less code on Monday. Those on-call on weekends are exhausted the entire week. Individuals that can’t find time to focus on code during the work week tend to compensate by coding on weekends, which demonstrates that they are unproductive during the week - a net negative. Having to work on a weekend to meet a deadline is usually a failure of planning. There are certainly exceptions, but they should be exceedingly rare.

What About Pet Projects and Open-Source?

Pet projects and open-source contributions outside of paid work are a wonderful source of independence and dopamine, I have several. I like doing that type of work on weekends, but only for a small amount of hours, and never feeling that it takes away from my main week-end activities that include spending time with family or friends. I think this attitude evolved over the years as I used to spend a lot of time coding on week-ends, but I believe I have consistently tried to make sure that was not paid work that I had to do. Finally, I am lucky to be paid to do open-source work, but I absolutely consider that work and avoid doing it outside of work hours.

Creating Discipline

Not working on weekends and not burning out requires discipline. If you find it difficult to detach yourself from the computer on weekends, begin your Saturday and Sunday with the hardest thing to do. For me, this meant forcing myself to open a book before my laptop, or putting my shoes on and going for a long run, first thing in the morning.

You can do it!

Should You Work on Weekends? was originally published by Daniel Doubrovkine at code.dblock.org | tech blog on January 20, 2025.

2025-01-08T01:00:00+00:00

Most paying customers expect a certain level of (paid) support. One opens a ticket, gets a response, and any issue can be resolved.

How does “support” work in open-source? My rule of thumb is to meet users where they are.

GitHub projects have issues, but opening an issue to ask a question is generally frowned upon. Therefore, in JNA, we have a mailing list, and in OpenSearch, we have a forum and Slack. StackOverflow is quite popular, too.

For a small project I choose GitHub issues as the preferred and only channel. It keeps things well organized and is the least amount of overhead for both users and maintainers. Most questions can be turned into either an issue (e.g. “I have tried this, and did not get the expected results.”), or a feature request (e.g. “Would it be possible to add a feature that works as follows?”). I like to use templates, such as these to help guide users in opening issues with enough information to get meaningful help.

For a larger project that has multiple areas of interest I like Slack with dedicated channels for various areas, such as the #clients channel on the OpenSearch public Slack. As a (co)maintainer of OpenSearch clients my responsibilities are as follows.

Create the channel. This is a public channel so that anyone in the workspace can join and participate, someone has to make one.
Set channel purpose and topic. Clearly define the purpose and topic of the channel. For example, the purpose could be “A space for discussing contributions to our project, including bug reports, feature requests, and code contributions.” The topic can provide a brief overview of the contribution guidelines and processes.
Pin important messages. Pin any important messages or documents related to contributing to the project, such as the contribution guidelines, code of conduct, or relevant links to your project’s repository or issue tracker.
Encourage participation. Announce the new channel to community members in #general and encourage everyone to join and actively participate in the conversation. You can share the channel link or invite them directly.
Moderate and respond. Monitor the channel regularly. Respond to questions, provide guidance, and ensure that discussions remain constructive and aligned with project’s contribution guidelines. I usually sweep my channels bi-weekly and respond to any message that has not been answered by someone else.
Triage and assign issues. When contributors report bugs or suggest new features, ask them to create issues in project’s issue tracker, and share the links in the channel to close the loop. Work with the members of the channel to try and assign someone to investigate and address these issues when they are willing to help.
Celebrate contributions. Recognize and celebrate contributions from community members. Share updates on merged pull requests, thank contributors, or highlight notable contributions in the channel.
Provide feedback. Encourage contributors to share their experiences and provide feedback on the contribution process. Use this feedback to improve the guidelines, documentation, or processes as needed.
Share updates and announcements. Use the channel to share releases, updates, announcements, or changes related to your project’s contribution processes or guidelines.

An interesting aspect of the open-source OpenSearch is that it’s also offered as a paid managed service on AWS. Therefore, it’s common for an AWS customer to ask a question about using OpenSearch in the managed environment, which is mostly identical to using the open-source product. Many such customers are willing to do some work and will often help narrow the problem down by trying the open-source version. Such customers always appreciate the closer feedback loop of the developer community on Slack, and the ability to resolve their issues in code by themselves, which is another example of meeting your ~~users~~ customers where they are.

Meeting Open Source Users Where They Are was originally published by Daniel Doubrovkine at code.dblock.org | tech blog on January 08, 2025.

2024-12-30T01:00:00+00:00

I found it annoyingly non-trivial to add a spell checker to this blog.

For now, I settled on GitHub Spellcheck Action that uses PySpelling on files changed in the commit or pull request as described in this blog post.

name: Check Spelling
on: [push, pull_request]
jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: tj-actions/changed-files@v46.0.4
        id: changed_files
        with:
          files: |
            **/**.md
            **/**.markdown
      - name: Check Spelling
        uses: rojopolis/spellcheck-github-actions@0.45.0
        with:
          task_name: Markdown
          config_path: .pyspelling.yml
          source_files: ${{ steps.changed_files.outputs.all_changed_files }}

To run PySpelling locally ensure you have a working version of Python, install PySpelling with pip install pyspelling, and aspell with brew install aspell on a Mac. In my configuration I also use pymdownx from pymdown-extensions which is installed with pip install pymdown-extensions.

You need a .pyspelling.yml and you can run it as follows.

pyspelling --config .pyspelling.yml

This is a Jekyll blog in which we want to ignore code, wrapped between Jekyll magic commands for syntax highlighting. This can be accomplished with a PySpelling pipeline in the above-mentioned configuration file.

pipeline:
  - pyspelling.filters.context:
      context_visible_first: true
      delimiters:
        # ignore jekyll multiline magic highlights {% ... %}
        - open: '(?s)^\{\% highlight .* \%\}$'
          close: '^\{\% endhighlight \%\}$'
        # ignore the rest of jekyll magic commands
        - open: '{%'
          close: '%}'

Finally, we can collect the initial set of words to potentially exclude from existing posts into .pyspelling.words.

pyspelling --config .pyspelling.yml | \
  sed -n '/^Misspelled words:$/,/^-*$/!p' | \
  grep -v "^---\+$" | \
  grep -v "^$" | \
  sort | \
  uniq > .pyspelling.words

The file helped me spot a few spelling mistakes, now fixed. See code.dblock.org#134 for the full change.

Update: I eventually switched to using regular backticks for code when adding a style checker in #136, so the above delimiters rule changed as follows.

- pyspelling.filters.context:
    context_visible_first: true
    escapes: \\[\\`~]
    delimiters:
      - open: "(?s)^(?P<open> *`{3,})([a-z+]*)$"
        close: "^(?P=open)$"
      - open: "(?P<open>`+)"
        close: "(?P=open)"
      - open: '{%'
        close: '%}'

Adding a Spell Checker to a Jekyll Blog was originally published by Daniel Doubrovkine at code.dblock.org | tech blog on December 30, 2024.

2024-12-20T01:00:00+00:00

Since my first day in AWS 5 1/2 years ago, I’ve been experimenting with keeping a CHANGELOG of everything I do, available for everyone at the company to see. I wrote about it here. If you work at AWS, use the URL in the screenshot below to find it.

I wrote a little script to try and classify where I spent my time. Here are some quick, very unscientific results.

I participated in 2882 meetings, 75% were probably technical discussions and 25% were not, including business reviews. I made 1013 code contributions, reviewed or read 364 documents, mostly technical designs. I did 192 1:1s with my colleagues, 130 interviews, and helped with 80 promos. I debugged or otherwise dealt with 175 customer-facing issues, authored 78 documents, took 64 trainings, and engaged in 48 planning exercises.

This, of course, doesn’t tell you anything useful. If you actually want to learn more about what PEs do at Amazon, check out the excellent Principal Engineer Roles Framework that I have been using since 2019 and that was made public this week by Mai-Lan Tomsen Bukovec.

I Participated in 2882 Meetings in 5 Years was originally published by Daniel Doubrovkine at code.dblock.org | tech blog on December 20, 2024.

2024-12-19T01:00:00+00:00

Have you watched The IT Crowd? It’s a hilarious British television sitcom from around 2006 that cast a bunch of IT geniuses at the Reynholm Industries tech support department in London. One of the signature laughs is that every time the phone rang, Roy would pick it up and without waiting say “Have you turned it off and on again?”, then hang up. I often feel like Roy when engaging with users reporting bugs in open-source projects I maintain.

Here’s my structured approach for any bug being reported in my open-source projects.

Do not fix the bug. Because this is open source, and we are not selling software, maintainers may have some social responsibility, but are under no obligation to do anything.
Do not try to reproduce the bug. Not convinced it’s a bug? Cannot reproduce the issue? Politely request additional information or clarification.
Do not try to write a test that proves this is a bug. Ask the person reporting the issue to try and write the automated and failing test. This helps narrow down the problem and ensures that the bug stays fixed in the future.
Do not fix the bug. Now that the person reporting the bug has written an automated test for it, they are this close from fixing it. Ask them to do it. This gives the individual reporting the bug a sense of ownership and contribution to the project.
Do not do anything else. Can’t get any engagement on the bug from the person reporting it? Leave the bug open. Someone else will pick it up.

tl;dr Do not fix the bug! A healthy open-source project will have lots of engaged contributors, especially when it comes to bugs. This is one of the lowest hanging fruit that you can harvest as a maintainer. That said, sometimes I just want to fix the bug myself because it’s so interesting. So YMMV.

Do Not Fix Bugs Reported in Your Open Source Projects was originally published by Daniel Doubrovkine at code.dblock.org | tech blog on December 19, 2024.

2024-12-18T01:00:00+00:00

Every active open-source project grows a lot of low hanging fruit. Encouraging project users to harvest some of the yield can be a great way to engage and retain new contributors. Here are some ideas, mostly borrowed from Barani, for low-hanging work items that you can propose to anyone engaging in your project.

Help Wanted, Good First Issue and Summer of Code

Identify and tag issues that have an easy solution to attract first time contributors.

Good first issues, or issues that are part of programs such as Summer of Code, provide an entry point for new contributors who want to get involved with the project, but are intimidated or unsure where to start. These issues are typically well-defined, self-contained, and relatively straightforward, making them accessible to newcomers.

I could use help with 58 issues right now and there are 683 issues labelled “good first issue” in the opensearch-project GitHub organization.

READMEs and Documentation

README.md, documentation and “Getting Started” for developers are often outdated or overlooked in open-source projects. Contributors can begin by fixing typos, improving clarity, or adding examples to existing documentation. This is a great way for new contributors to get familiar with the project’s codebase and contribute without making significant code changes.

The recent transfer of the OpenSearch Project to the Linux Foundation required a change the code-of-conduct violations e-mail address in CODE_OF_CONDUCT.md in opensearch-project/.github#245. The same, trivial change needs to happen in 116 repositories. We can get some help with automation, but someone still needs to do it.

Updating Dependencies

Most code projects rely on external dependencies, and keeping them up-to-date is crucial for security and compatibility reasons. Contributors can help by identifying outdated dependencies and submitting pull requests to update them to the latest versions.

Almost all projects in the mongoid GitHub organization would appreciate PRs that add support for newer versions of MongoDB, which is most likely no work other than adding new versions to the CI matrix similar to mongoid-scroll#45. Another example is upgrading Gradle in OpenSearch, such as OpenSearch#16694, something that had to be done over 100 times over the last 3 years and could not be automated because it required minor code changes along the way.

Improving Test Coverage

Writing tests is an essential part of software development, but it’s not uncommon for open-source projects to have gaps in their test coverage or not tests at all. Contributors can help by adding new test cases or improving existing ones, ensuring that the project’s codebase is thoroughly tested. Adding code coverage is another easy contribution.

I have recently opened statusphere-example-app#26 that added tests to the atproto boilerplate example and mongoid-scroll#47 that added test coverage.

Refactoring Code

As projects evolve, their codebase can become messy and difficult to maintain. Contributors can help by identifying areas that need refactoring and submitting pull requests to improve code readability, maintainability, and performance.

Eric, one of the maintainers of ruby-grape has been steadily improving the library performance such as in grape#2513, or refactoring various parts such as grape#2475.

Fixing Linting Errors

Most projects have automated linting rules in place to enforce coding standards and best practices. Contributors can help by identifying and fixing linting errors, ensuring that the codebase adheres to the project’s coding standards.

You can begin by adding a linter if there isn’t one such as opensearch-api-specification#312, upgrade RuboCop in Ruby projects such as grape#2497, or fix actual violations being ignored in .rubocop_todo.yml such as in grape#2459.

Reporting and Engaging on Issues

Contributors can help by reporting new issues, reproducing existing ones, moving issues forward, or helping categorize issues to help maintainers prioritize their work.

For example, I figured out how to reproduce a rare bug in logstash-output-opensearch in logstash-output-opensearch#207, and helped a user enable verbose logging to debug another problem in logstash-output-opensearch#187.

Creating Examples or Tutorials

Providing examples and tutorials can make it easier for new users to understand and adopt a project. Contributors can create new examples, tutorials, or sample projects to help others get started with the project.

For example, as I was researching various vector database APIs, I wrote a set of working samples for every database. I also implemented an example of using the OpenSearch async Python client with AWS in opensearch-py#857 when someone asked me for help.

Improving Project Infrastructure

Open-source projects often rely on various tools and infrastructure, such as continuous integration, deployment pipelines, or issue tracking systems. Contributors can help by identifying areas for improvement and submitting pull requests to enhance the project’s infrastructure.

For example, I added a feature that makes CI fail when someone accidentally forgets to reference a test suite in the test matrix in opensearch-api-specification#705 after someone else came up with this idea.

Reviewing Code

Reviewing other people’s code could be easier to start with than actual coding. Even they may not have all the knowledge and review all the change, they may still be able to point out some common issues or raise good questions. If their review proves to be valuable to others and with more and more knowledge, they may think of starting contribution as well.

For example, an experienced PHP developer that has never contributed to the project found work in progress by another developer for a problem they had, and jumped in help review it opensearch-php#233. I gently nudge both the contributor and the reviewer to get that PR finally merged.

In conclusion, by proposing these low-hanging work items, we can encourage new contributors to get involved and make meaningful contributions to open-source projects. It’s important to provide clear guidelines, support, and mentorship to help contributors navigate the project’s codebase and contribution process. So go ahead, engage with a new project participant, ask them whether they would be interested in, and gently guide them to make their first contribution.

Harvesting the Low Hanging Fruit in Open Source Projects was originally published by Daniel Doubrovkine at code.dblock.org | tech blog on December 18, 2024.

2024-12-14T01:00:00+00:00

I’ve been very active in open source since around 2008 and have become very passionate about growing contributors and maintainers in the projects I am involved in. The health of an open-source project can easily be measured by the number of actively engaged participants in its code, which cannot happen without maintainers. Maintainers are leaders driving open-source projects. The best maintainers contribute significant code, have the responsibility to review other people’s contributions, and perform the release process. In my experience, the best ones are experts in the system they are developing. They are prolific coders, and are able to collaborate with contributors that don’t work for the same business, maintain a high quality bar, sweat the small stuff, and recognize behaviors that earn trust in others. The goal of a project maintainer is to always improve the project, ensure it stays the course of its core mission, and to work themselves out of the job by promoting contributors to maintainer positions. This is often referred to as meritocracy.

I hold two possibly controversial opinions. First, to attract long lasting contributors that can become maintainers a project needs an owner, sometimes known as a benevolent dictator. Second, contributors need to have a business need, or an unusually strong personal interest to stick around.

I have yet to see it work differently, and I will illustrate this with some examples from my own experience.

In 2010 I was working a database security analytics product. We were paying an exorbitant amount of money for a software license to get basic hardware information. To save costs I created OSHI, an operating systems hardware information library written in Java. The first contributor to the project came in 2013, a handful in 2015, 8 became project co-maintainers, and one emerged as the defacto benevolent dictator by 2018, with 201 people contributing to 158 releases as of today. It is the defacto cross-platform standard for enumerating hardware components in Java.

The same year our Enterprise product needed a single-sign-on solution for our Java web application, so I created and open-sourced Waffle, which is based on Java Native Access that I began contributing to, quickly becoming JNA’s co-maintainer. Today, waffle has 4 maintainers, a new benevolent dictator since 2014, and 52 people have contributed to 22 releases. The JNA project is hugely popular, used in anything from OpenSearch to IntelliJ, but only has 9 maintainers, is on its 4th benevolent dictator, and 166 people have contributed to 65 releases.

In 2011, while building Artsy, we standardized on Grape, a RESTful API framework alternative to Ruby-on-Rails. I began contributing to the project and earned maintainer rights to it around 2013, eventually becoming the benevolent dictator. I created the ruby-grape organization, and brought in a number of related projects together in 2015. The core Grape project has 7 maintainers today, and 426 people have contributed to 73 releases.

Finally, in 2019 I joined the OpenSearch Project team at Amazon to work on the fork that powers the Amazon Managed and Serverless OpenSearch service. The project is now part of the Linux Foundation. I am an active co-maintainer of 24 out of 116 repositories in the opensearch-project org on GitHub and benevolent dictator in a few. It has 257 maintainers today, and 1,402 people have contributed to hundreds of releases across the distribution of OpenSearch and OpenSearch Dashboards, ingestion tools, and clients.

To understand the patterns behind these stories, let’s examine how projects attract contributors and grow them into maintainers.

Early this year I reached out to the existing maintainers across opensearch-project that did not work for Amazon, and asked them questions about how they became maintainers. 20 have replied. I learned that most came to the project as their businesses were migrating from Elasticsearch to OpenSearch, often because their employer had business needs in OpenSearch, and remained active because they were pleased with how well-maintained, healthy, and high quality these projects were.

This is the recipe that we have followed for 3 years in the repos that I have been involved in.

At least one highly engaged maintainer that acts as the project owner or benevolent dictator, and routinely commits code to the repo.
Open-source basics including a license, code-of-conduct, security and copyright files and statements.
A well-written README.md, project documentation, and a developer guide that one can easily follow to begin contributing without help.
High test coverage, code linters, and a reliably passing CI/CD.
Engagement from at least one maintainer on every single issue, pull request or comment within 24 hours.
Visible recent commit activity, even if just from a dependency manager such as dependabot.
Regular releases, a CHANGELOG or release notes.

I often step up to do this in repos I am engaged in and, naturally, in all repos I create. My job as a maintainer is to make all the above points happen as a prerequisite to anyone engaging in a project.

A user with a business need will typically open an issue before contributing code. As a maintainer, I will respond to their inquiry in a way that encourages contribution, even a minimal one. For example, if a user opens a bug, I do not jump in to fix it, especially if it’s an easy fix. My typical response is: Thank you for reporting this issue! Looks like a bug. Would you like to help? Maybe write a (failing) test that reproduces this issue? Contributors that write a test often also fix the bug. If a user asks how to do something, I help them and ask them to contribute a sample or additional documentation.

When reviewing pull requests (always within 24 hours) my goal is to get the contribution merged. How can I say yes more often? I try to make sure I am clear about what change is a must have, should have, or nice to have. I reserve must have’s for critical misses (e.g. poor APIs that will be hard to change later), and always sweat the small stuff and insist on highest standards (e.g. tests and documentation are always required). This is because one’s contribution is never allowed to create more work or technical debt for me or other participants in the project. Finally, I often thank contributors profusely for hanging in there with me through multiple iterations, and apologize for my excessive OCD over periods at the end of sentences. In my experience people take this attention to detail really well and are motivated to contribute to a project that keeps the quality high, often higher than their projects at work that make catastrophic technical compromises to reduce time to market.

At last, if you made any significant, high quality, contribution to one of my open-source projects, and have a decent contribution history on Github, I will ask you to join as a co-maintainer quickly. The bar is quite low, I just need to see that you can be trusted to exhibit the behavior described above as a maintainer. In some ways, this ask is a non-automated, higher touch, softer implementation of the Moya Contributing Guidelines, which I wrote about here. When people step up I usually step out and let them take over, working myself out of the job. Thus, a strong community of contributors grows around the project, which is the real measure of success.

Do you follow a similar recipe? Have you see open-source projects emerge and succeed in different ways? Let me know!

How to Grow Open-Source Contributors and Maintainers was originally published by Daniel Doubrovkine at code.dblock.org | tech blog on December 14, 2024.

2024-11-21T01:00:00+00:00

I’ve previously written about the OpenSearch OpenAPI Specification and how it can be used to fix bugs in the OpenSearch documentation.

Most of the tests in the API spec work with a generic Docker container and some basic network setup to expose a local 9200 port.

version: '3'

services:
  opensearch-cluster:
    image: opensearchproject/opensearch:latest
    ports:
      - 9200:9200
      - 9600:9600
    environment:
      - OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m
      - discovery.type=single-node

When testing an optional plugin we install it in Dockerfile, such as in the following analysis test suite.

ARG OPENSEARCH_DOCKER_HUB_PROJECT
ARG OPENSEARCH_VERSION
ARG OPENSEARCH_DOCKER_REF

FROM opensearchproject/opensearch:latest
RUN ./bin/opensearch-plugin install -b analysis-icu
RUN ./bin/opensearch-plugin install -b analysis-phonenumber

The docker-compose.yml references this Dockerfile in build.

  opensearch-cluster:
    build:
      context: .

Testing more advanced APIs in OpenSearch requires a setup that also imports data. For example, OpenSearch has a concept of so-called dangling indexes. After a node joins a cluster, dangling indexes occur if any shards exist in the node’s local directory that do not already exist in the cluster. To fabricate a dangling index one must create a cluster that loses quorum. Fortunately, we have available to us an excellent talk by Amitai Stern that goes into great depth into data migrations between clusters and talks about a method called “node toss” at around 17:50 that puts node indexes into a dangling state. Amitai has also authored a cookbook for re-creating a dangling index in a Docker container.

Here’s how it works.

We begin by creating a single-node cluster without security that uses a local directory (opensearch-data1) to store data. Note the --entrypoint /bin/bash which opens a shell instead of starting the cluster.

docker run --name opensearch-single-node-cluster\
  -it --entrypoint /bin/bash \
  -p 9200:9200 -p 9600:9600 \
  -e "discovery.type=single-node" \
  --ulimit memlock=-1:-1 \
  --ulimit nofile=65536:65536 \
  -e DISABLE_INSTALL_DEMO_CONFIG=true \
  -e DISABLE_SECURITY_PLUGIN=true \
  -v $(pwd)/opensearch-data1:/usr/share/opensearch/data \
  opensearchproject/opensearch:latest

We manually start it with ./opensearch-docker-entrypoint.sh, then insert some data from another shell.

curl -X POST http://localhost:9200/movies/_doc --json '{"director":"Bennett Miller","title":"The Cruise","year":1998}'

{
  "_index": "movies",
  "_id": "B0VvT5MBVIiutAAib7dO",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 2,
  "_primary_term": 2
}

Remove the index replicas.

curl -X PUT  http://localhost:9200/movies/_settings --json '{"index.number_of_replicas":0}'

{"acknowledged":true}

Detach the cluster from the Docker shell.

./bin/opensearch-node detach-cluster

Confirm [y/N] y
Node was successfully detached from the cluster

Exit the container with exit and start a new 2-node cluster on top of this data with this docker-compose.yml that mounts our previously detached data.

services:
  opensearch-node1:
    ...
    volumes:
      - ./opensearch-data1:/usr/share/opensearch/data
    ports:
      - 9200:9200
      - 9600:9600
  opensearch-node2:
    ...

After about 30 seconds the cluster figures out what to do, and creates the desired dangling indexes out of the existing data in node1.

curl -X GET http://localhost:9200/_dangling

{
  "_nodes": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "cluster_name": "docker-cluster",
  "dangling_indices": [
    {
      "index_name": "movies",
      "index_uuid": "p6tliBznQO-FzTdslShrwA",
      "creation_date_millis": 1732195242038,
      "node_ids": [
        "66ikk5pNQn-W9H0UN08OgA"
      ]
    }
  ]
}

Trying to recover this data will not work because the cluster is in a quorum loss and there’s no elected cluster manager.

curl -X POST http://localhost:9200/_dangling/NAXLxxvEQfKHwRzxA7nxRw?accept_data_loss=true

{
  "error": {
    "root_cause": [
      {
        "type": "cluster_manager_not_discovered_exception",
        "reason": "no cluster-manager to send allocate dangled request"
      }
    ],
    "type": "cluster_manager_not_discovered_exception",
    "reason": "no cluster-manager to send allocate dangled request"
  },
  "status": 503
}

Deleting the dangling index will also hang waiting for a cluster manager.

We can turn our 2-node quorum loss cluster into a healthy single-node cluster with these dangling indexes. Attempting to switch the cluster discovery.type=single-node will not work because the node is part of a 2-node cluster that lost quorum.

opensearch-node1-1  | [2024-11-21T16:22:38,376][ERROR][o.o.b.OpenSearchUncaughtExceptionHandler] [opensearch-node1] uncaught exception in thread [main]
opensearch-node1-1  | org.opensearch.bootstrap.StartupException: java.lang.IllegalStateException: cannot start with [discovery.type] set to [single-node] when local node {opensearch-node1}{UvfTjEerS76oc3RM5meQug}{8-SaRbGfSP64wLDkSwubNA}{172.19.0.2}{172.19.0.2:9300}{dimr}{shard_indexing_pressure_enabled=true} does not have quorum in voting configuration VotingConfiguration{_must_join_elected_cluster_manager_}

We need to properly detach one of the nodes. This is done by setting the number of replicas to zero, and removing node2 from voting.

curl -X PUT  http://localhost:9200/_settings --json '{"index.number_of_replicas":0}'
curl -X POST http://localhost:9200/_cluster/voting_config_exclusions?node_names=opensearch-node2

A single node cluster can now start on top of this data.

However, while we can mount it directly via docker-compose, we don’t want to update the local copy of our data files, and instead copy them into the Docker container every time in order to have a stable initial state. Copying files causes OpenSearch to think that they were tampered with externally.

opensearch-cluster-1  | [2024-11-21T15:02:27,832][ERROR][o.o.b.OpenSearchUncaughtExceptionHandler] [0e69ab6a4227] uncaught exception in thread [main]
opensearch-cluster-1  | org.opensearch.bootstrap.StartupException: org.apache.lucene.store.AlreadyClosedException: Underlying file changed by an external force at 2024-11-21T15:02:27.801461006Z, (lock=NativeFSLock(path=/usr/share/opensearch/data/nodes/0/_state/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2024-11-21T15:02:21.011461002Z))

The solution is to get rid of lock files with find . -name *.lock | xargs rm. Now our local data is usable in a single-node cluster and this is what we add to git. Finally, copied files also have to be writable. We add COPY --chown=1000:1000 .data/opensearch-data1 /usr/share/opensearch/data to Dockerfile and end up with a single-node cluster with dangling indexes that can be deleted or recovered.

curl -X DELETE http://localhost:9200/_dangling/p6tliBznQO-FzTdslShrwA?accept_data_loss=true

{"acknowledged":true}

Note that DELETE and POST _dangling return 202 Accepted and not 200 or 201.

See opensearch-api-specification#686 for details.

Creating an OpenSearch Cluster with Dangling Indexes was originally published by Daniel Doubrovkine at code.dblock.org | tech blog on November 21, 2024.