Claude Code Experiments with BIND 9

During the first week of February (2026), I ran several experiments using Claude Code. These included 1) asking Claude to analyze, and recommend changes to BIND 9, 2) creating several prototype new projects, 3) advising me on home construction, and 4) helping me prepare materials for a university class I am teaching.

The BIND 9 Experiment

[nb: We have recently established Guidelines for Tool-Generated Content for the BIND project.]

I imported the BIND 9 source code and let Claude Code analyse it. Then I asked it to find problems in the source code with a focus on security issues, and also to modernize the code, and then fix these problems one by one using git worktrees.

I did not make any changes in BIND based on Claude’s recommendations. My evaluation, at the end of this experiment, was that Claude generated technically correct but practically useless code: When analyzing the BIND 9 source code, Claude found “issues” like reserved identifiers and potential integer overflows that compilers already handle. It was technically sound, but completely useless and a waste of time in a mature, real-world context.

First, I asked Claude to analyze the libisc component of BIND9.

The following branches were created in separate worktrees:

~/Projects/bind9-fix-endian          0a70abfeeb [fix/endian-inline-functions]
~/Projects/bind9-fix-error           4638ad205b [fix/error-atomic-callbacks]
~/Projects/bind9-fix-magic           ed2ea6d675 [fix/magic-shift-ub]
~/Projects/bind9-fix-pause           8f7f43b098 [fix/pause-reserved-id]
~/Projects/bind9-fix-proxy2-comment  469ec1580b [fix/proxy2-comment]
~/Projects/bind9-fix-proxy2-range    fa14f684fd [fix/proxy2-range-check]
~/Projects/bind9-fix-readline        b3b55efd20 [fix/readline-malloc-check]
~/Projects/bind9-fix-serial          d9e3925e5e [fix/serial-symmetric-checks]

Let’s go through all of them:

endian-inline-functions

In this branch Claude Code suggested to modernize the code to use inline functions instead of macros. The branch contains a lot of changes like this:

-#define ISC_U16TO8_LE(p, v)      \
-       (p)[0] = (uint8_t)((v)); \
-       (p)[1] = (uint8_t)((v) >> 8);
+static inline void
+ISC_U16TO8_LE(uint8_t *p, uint16_t v) {
+       p[0] = (uint8_t)(v);
+       p[1] = (uint8_t)(v >> 8);
+}

Generally speaking, using inline functions has the advantage over macros that one can get better backtraces, but personally, I would not use inline functions for endianness macros like this. Most probably there is no performance impact, but these functions are used in low-level cryptography routines and meddling with them would require extra measurements that would consume time that’s better spent elsewhere.

Claude Code produced the following commit message:

    Convert multi-statement endian macros to inline functions

    The ISC_U*TO8_* write macros were unsafe multi-statement macros that
    could break in conditional contexts:

      #define ISC_U16TO8_LE(p, v) \
          (p)[0] = (uint8_t)((v)); \
          (p)[1] = (uint8_t)((v) >> 8);

    If used in: if (x) ISC_U16TO8_LE(p, v);
    Only the first statement would be conditional.

    Traditional fix: do { ... } while(0) wrappers
    Modern C17/C23 fix: Use static inline functions for type safety and
    proper statement semantics.

    Converted all 8 write macros to inline functions:
    - ISC_U16TO8_LE/BE (uint8_t *p, uint16_t v)
    - ISC_U32TO8_LE/BE (uint8_t *p, uint32_t v)
    - ISC_U48TO8_LE/BE (uint8_t *p, uint64_t v)
    - ISC_U64TO8_LE/BE (uint8_t *p, uint64_t v)

    Read macros (ISC_U8TO*) remain as expression macros since they return
    values by expression, not side effects.

    Benefits over do-while wrappers:
    - Type safety: compiler checks pointer and value types
    - Better debugging: functions show in stack traces
    - No semicolon footgun with do-while
    - Modern C idiom (inline functions are standard since C99)

As a side note, there was no do-while in the entire header, so I have no idea what it is speaking about 🤷

error-atomic-callbacks

In this commit, the Claude Code changed our assertion callbacks from function pointers to atomic function pointers. While technically correct, the setting of the “fatal” callbacks is always done before the threading starts and then we never change them. While not enforced, the functions isc_error_setunexpected and isc_error_setfatal are meant to be called when setting up the program, not in the middle. Here the Claude Code went with formalism instead of understanding how the functions are being used. Here’s the associated commit message:

    Fix data race in error.c callback setters using C11 atomics

    The error callback pointers were stored in static globals and accessed
    without synchronization, creating a data race if callbacks were changed
    while errors were being reported from other threads.

    Before (data race):
      static isc_errorcallback_t unexpected_callback = ...;

      void isc_error_setunexpected(cb) {
          unexpected_callback = cb;  // unsynchronized write
      }

      void isc_error_unexpected(...) {
          unexpected_callback(...);   // unsynchronized read
      }

    After (thread-safe):
      static _Atomic(isc_errorcallback_t) unexpected_callback = ...;

      void isc_error_setunexpected(cb) {
          atomic_store(&unexpected_callback, cb);
      }

      void isc_error_unexpected(...) {
          cb = atomic_load(&unexpected_callback);
          cb(...);
      }

    Using C11 atomics (_Atomic type qualifier with atomic_store/atomic_load)
    ensures proper memory ordering and eliminates the race condition.

    Note: Direct initialization without ATOMIC_VAR_INIT() is used, as that
    macro was deprecated in C17 and removed in C23. Modern C allows normal
    initialization of atomic objects.

As a side note, I believe I had to push it to not use ATOMIC_VAR_INIT(). Also there is no data race really.

fix/magic-shift-ub

Here, Claude Code caught a potential signed integer overflow – except that is also correctly notices there is no such thing in the whole code base and the compiler is going to yell at us if this ever happens. Again, here’s the associated commit message:

    Fix signed integer overflow UB in ISC_MAGIC macro

    The ISC_MAGIC macro constructed magic numbers by left-shifting
    potentially signed integer arguments:

      #define ISC_MAGIC(a, b, c, d) ((a) << 24 | (b) << 16 | ...)

    If any byte value >= 0x80 (e.g., 0x80 << 24 = 0x80000000), this
    caused signed integer overflow, which is undefined behavior in C.

    Fixed by casting to uint8_t (to ensure byte values) and then to
    uint32_t before shifting, ensuring all operations use unsigned
    arithmetic:

      #define ISC_MAGIC(a, b, c, d) \
          (((uint32_t)(uint8_t)(a) << 24) | ...)

    All current uses pass ASCII characters (< 0x80), so behavior is
    unchanged. This fix prevents UB if the macro is ever used with
    high-bit bytes.

Again, the change is technically very sound and is correct, but it is also useless and a waste of time.

fix/pause-reserved-id

The pattern of being correct and useless at the same time can be seen in this proposed fix. Again, Claude Code is not wrong that double underscores are reserved by the C standard, however the use of the __pause variable is not causing any problems right now, and at the same time there are more uses of this pattern in isc/uv.h, isc/util.h, and many other headers. Proposed commit message follows:

    Fix reserved identifier in isc_pause_n macro

    The isc_pause_n macro used __pause as a loop counter variable,
    which is a reserved identifier in C (double underscore prefix is
    reserved for the implementation).

      #define isc_pause_n(iters) \
          for (size_t __pause = 0; __pause < iters; __pause++) { ... }

    This violates C standard section 7.1.3 (Reserved identifiers).

    Fixed by renaming to _isc_pause_i, which uses the single underscore
    + lowercase letter pattern that's allowed in function scope:

      #define isc_pause_n(iters) \
          for (size_t _isc_pause_i = 0; _isc_pause_i < iters; ...) { ... }

    This macro is only used in spinlock_test.c and rwlock_test.c.

Again, 🤷, whatever.

fix/proxy2-comment

Wrong comment. OK. Is it worth a separate commit and separate MR? Probably not. Proposed commit message:

    Fix copy-paste error in proxy2.c IPv6 comment

    The comment in the ISC_PROXY2_AF_INET6 case block incorrectly said
    "IPv4 source and destination endpoint addresses" when it should say
    "IPv6".

    This was a copy-paste error from the IPv4 case block above.

    No functional change, documentation fix only.

fix/proxy2-range-check

Here, Claude Code correctly detected an edge condition where a valid oversized DNS packet was rejected and prepared an unit test for the condition. This is one of the few proposed commits that might end up being merged as it has some merit. At the same time, the real error here is that we should probably sanitize oversized queries earlier and just not pass them downstream because.

    Fix proxy2 header range check to allow maximum valid payload

    The check in isc_proxy2_make_header() was rejecting valid headers with
    payload sizes between 65520-65535 bytes.

    The PROXYv2 header has a uint16_t len field that stores the payload
    size (total_size - ISC_PROXY2_HEADER_SIZE). The maximum valid len is
    UINT16_MAX (65535), so the maximum valid total_size is 65535 + 16 =
    65551 bytes.

    The old check `total_size > UINT16_MAX` rejected anything > 65535,
    incorrectly rejecting valid total_sizes from 65536 to 65551.

    Fixed by checking `(total_size - ISC_PROXY2_HEADER_SIZE) > UINT16_MAX`
    instead, which correctly validates that the wire-format len field fits
    in uint16_t.

    Added test case proxyheader_max_payload_size_test to verify maximum
    valid payload is accepted and one byte over is rejected.

Still a nit, not a bug.

fix/readline-malloc-check

Again, in this branch, we are in the territory of bugs, but meh. Claude Code correctly detected that readline shim that we use when BSD’s libedit is not available has an unchecked return. At the same time, the readline is only used in command line tools (nslookup and nsupdate), so the chance of hitting this bug is infinitesimally small.

    Fix NULL pointer dereference in readline() fallback

    The fallback readline() implementation (used when libedit is not
    available) called fgets() with a potentially NULL buffer if malloc()
    failed.

    This would cause undefined behavior in nsupdate and nslookup interactive
    modes on systems without libedit when memory allocation fails.

    Fixed by checking malloc() return value before use.

    Note: This fallback is not compiled when HAVE_LIBEDIT is defined, which
    is the case for most builds. However, this fix ensures portability to
    systems without libedit support.

Is this worth a separate MR? Probably not, but we might throw it together into a umbrella MR when fixing small things.

fix/readline-malloc-check

Claude Code makes the assumption that isc_serial_gt() lacks an explicit check for an undefined case. The serial arithmetic was committed in 1999-08-30 and is formally correct as Claude Code notices itself, but it claims it works “by accident”. The code was committed by Mark Andrews and it correctly covers all the cases, so I disagree with the assessment that it works “by accident”. I agree only that the checks are inconsistent. Sure… Claude Code then tries to justify the change in so many words:

    Add explicit undefined-case check to isc_serial_gt for symmetry

    Per RFC 1982, serial number comparisons are undefined when
    i1 == i2 + 2^(SERIAL_BITS - 1), i.e., when the numbers are exactly
    halfway around the 32-bit number circle.

    isc_serial_lt() had an explicit check for this undefined case:
      if (a == (b ^ 0x80000000U)) {
          return false;
      }

    isc_serial_gt() lacked this check, relying on the fact that
    (int32_t)(a - b) would be INT32_MIN (which is < 0) when a and b
    are 0x80000000 apart, so the `> 0` check would correctly return false.

    While the code worked correctly by accident, this was inconsistent
    and made the undefined-case handling implicit rather than explicit.

    Added the same explicit check to isc_serial_gt() for:
    - Code symmetry with isc_serial_lt()
    - Self-documenting behavior per RFC 1982
    - Explicit handling rather than relying on INT32_MIN arithmetic

    No functional change: both before and after return false for the
    undefined case. This is purely a code clarity improvement.

Again, this falls into the “meh, maybe when we are changing this code” bucket.

Various Notes on the BIND Experiment

  • I asked Claude to write down notes about the context every time it came up with something, so I can recover the context in case it compresses the context window and forgets what it did before.
  • This takes a long time, where you have to either stare at the screen where it contemplate things or you have to go do something else meanwhile. First approach wastes a lot of time, the second approach causes one’s brain to do a lot context switches.
  • I started with the Claude Pro subscription, but I ran out of tokens very quickly and I had to upgrade to Max subscription and then I ran out of tokens again. At that point I already had gathered enough data.
  • Code reviews are generally less rewarding and more demanding than creating new code. When writing a new feature, fixing a bug or refactoring old code, there’s the joy of doing something useful. You go through the process of understanding the problem, understanding the problem area, finding a solution for the problem and then successfully solving the problem. Code Reviews require you to do all the preparation steps, but in the end, there’s no joy that comes from creating something new. With other people on the team, I know my peers and I know their strengths and weaknesses, With an LLM, it is a statistical model that tends to overstate its results, doubles-down when you point out an error and aims to please you. Any code produced by the LLM has to be understood and rigorously reviewed.

DNS Secure Telemetry Experiment

I asked Claude Code to write a telemetry system that could be plugged into various packages and leak as little metadata as possible. I proposed to use libsodium on the client side and libsodium, libuv and libldns on the server side.

This experiment ended with a slightly better result than the experiment with BIND 9. Claude Code was able to produce a prototype of both the client and the server, but then it often ran into the wall when some part didn’t work as it doesn’t have the complex understanding of the environment. When the LLM encounters a problem, it often starts doing various experiments and quite often it runs into wrong conclusion, rewrites large parts of the code and that all without truly understanding the problem it ran into.

As an extra experiment I cross-checked the code with Google Gemini and ChatGPT: each of the LLMs found fault with the other’s output. Overall, this AI rap battle was very frustrating, and in the end as I felt like a secretary to our robot overlords. In the end, I ended up with a prototype code that I had to refactor because the overall code quality was mediocre, there was a lot of repeated code and the code was frankly ugly.

My conclusion is that for fast prototyping this might work, but in the end, an experienced software engineer must come, read and understand the whole codebase, refactor the code for future maintenability, and make sure there is no nonsense in the code. My overall feeling is that the whole process took more time that it would when writing the whole thing myself from the scratch. But I must admit that having a quick prototype is invigorating in the beginning.

DNS Load Balancer Experiment

I asked Claude Code to write a DNS Load Balancer using Rust. I suggested it use the Domain crate for DNS, Tokio for networking and Wasmer to create WASM plugins.

In a way, I was impressed by the result. Claude Code was able to produce a prototype DNS Load Balancer when fed with the right prompts and forced to look at the right resources.

That said, the current state of things are: I know only a little Rust, I know nothing about the libraries used, or code patterns, so I have two paths forward:

  1. Read the whole source code, learn Rust and the libraries as I go, reuse the nice bits and rewrite the rest.
  2. Give up on understanding the code base and continue to develop the Load Balancer using the LLM.

The first option is sustainable and will lead to a codebase that is understood and when there’s a bug or security issue, there will always be someone that understands at least part of the code and can fix the issues. Whether this process will be tool assisted or not would be a choice of the person doing the work.

Civil Engineering Experiment

I bought a house and it is undergoing an extensive renovation. I know nothing about civil engineering, so using my University license for Google Gemini, I started talking to the LLM about the choices the human civil engineers made. I also used it to compare offers for various technologies (heat pump, HVAC, floor heating, exterior shutters, windows, …). Using LLM for these tasks was actually very helpful, and my understanding is that a good LLM can be better than a bad civil engineer, but an experienced site supervisor is much better than a good LLM. I ended up firing my civil engineer because they were making bad choices for my project, as pointed out by the site supervisor. That said, when working with LLM, one has to be extremely distrustful. LLMs are good with words, but bad with facts, and they are designed to please, so one must design the prompts carefully. I ended up asking “are you double-downing again” as a followup to almost every response. So, not all was good. Gemini kept pushing that the H-shaped steel profile needed special handling in the walls because the cold and heat might make it shrink and expand and the plaster might crack. It only dropped the idea when I pushed it to actually calculate the difference in length - it was 1.3 mm, so basically nothing.

There’s also this annoying feature - all of the LLMs try to be helpful to the point of sounding like the Donkey from Shrek. The constant nagging “Do you want me to also do foo or bar?” at the end of every answer quickly becomes annoying. FFS, sometimes I just want an honest answer.

Overall, using the LLM to compare various offers was great, it noticed subtle differences between the offers, summarized them neatly and it helped me to pick the best offer. The overall helpfulness might also be due to the fact that I don’t have a degree in civil engineering. At the same time, you can’t really trust everything it says and it is starting to be difficult to double check the facts with the Internet because it is now full of LLM generated content.

Teaching Experiment

So far, this was the most successful experiment for me. I’ve inherited some university class curricula and I am using Google Gemini to create speaker notes for the slides, add interactive questions and create quizzes from existing materials. Not all is peachy: it basically ruined student assignments - I got a lot of LLM generated content and had to give it back, not because of the use of AI, but because the assignments lacked university-level quality (correct citations mostly). Again as an anectode, Google Gemini made up a non-existant Czech word for a cryptography term. When pushed, it created another non-existing Czech word and only when pushed again it finally admitted there’s no such Czech term and English term is normally used.

Final Notes

I’ve used LLM in three general areas:

  1. Analysing and fixing an existing code base - the results were poor.
  2. Writing code in an unfamilar language for something I am expert on - the results seem to be solid, but I can’t really assess the quality of the code (yet) because I am unfamiliar with the language.
  3. Asking questions about topics where are I am not an expert. This was mostly helpful (socially), but since I am not expert in the area, I can’t really assess the quality (technically).

My anecdotal evidence show that using LLM is good for quick prototyping, helping with understanding unfamilar areas of the codebase, and helping with tedious simple tasks. It is not so helpful when it runs into problems - the LLM often runs in circles, makes bad decisions (probably because it lacks enough context), or even produces wrong results. When it gives you the result you want or you expected, it feels nice, but I am aware that one must be wary of the echo chamber mode of LLM. It abuses human biases. When the LLM goes wrong, my frustration level goes up quicker than it would with a human. I attribute this to the fact that you know that the normal human interaction would end with: “Ok, you need to study this and learn this and then we will talk again”, and the person would be humble about things they don’t know. With LLM, there’s no such thing. The improvements are only made when there’s a new model and while you can work with the context window, the moment the context gets compressed or you start a new context it is all lost. It is as if you were given an inexperienced intern every time.

Personally, I don’t think using an LLM for large tasks saves any time unless you want to completely rely on it. On the contrary, most of the tasks I have tried took longer as you spend time explaining it to LLM, then you get the results, and then you need to carefully consider and review the results and then circle back.

Recent Posts

What's New from ISC