Overview

This manual attempts to cover things that don't logically fit in the man pages. It covers the big-picture items and overall design It's a cross between a manual and a design document at the moment, although perhaps those functions can be split up at a later date.

This manual is not intended to be an exhaustive reference. For a complete rundown of every configuration and commandline option and its precise technical meaning, see the man pages for gdnsd, gdnsd.config, and gdnsd.zonefile.


Portability Testing

This was done during the run-up to 1.2.0, and hasn't been re-tested by me since.

I've personally run through the process of, basically: ./configure && make check && sudo make install && make installcheck on the following platforms with the following compilers. On all of them, I've also manually started the daemon as root and seen it produce correct syslog output, appear to secure itself, answer a few manual queries, etc.

Of course most platforms required some prep work to get the testsuite dependencies installed, but it wasn't too hard on any of them.

  x86_64:
    Linux - Fedora 12: gcc 4.4.3
    Linux - Fedora 13: gcc 4.4.4
    Linux - Debian Etch: gcc 4.1.2
    MacOS X Snow Leopard: gcc 4.2.1, llvm-gcc 4.2.1, clang 1.0.2
  x86:
    Linux - Fedora 13: gcc 4.4.4
    FreeBSD 8.0: gcc 4.2.1
    FreeBSD 7.2: gcc 4.2.1
    FreeBSD 6.4: gcc 3.4.6
    OpenBSD 4.4: gcc 3.3.5
    MacOS X Snow Leopard: gcc 4.0.1
    OpenSolaris 2009.06: gcc 3.4.3, gcc 4.3.2, Sun Studio C Compiler
  mips32r2 (Big Endian) (Atheros 9132 - Buffalo WZR-HP-G300NH Router):
    Linux - OpenWRT Backfire: gcc 4.3.3 + uClibc 0.9.30.1
      (cross compiled from x86_64 build host using OpenWRT's toolchain)
      (also, I wasn't able to run the testsuite here (vendor perl broken),
       but I did do some simple manual testing)

I think this covers a fair swath of the field of modern-ish POSIXy systems. The Linux and Mac tests were on real hosts, and the rest were done in VMWare using standard images found via Google. Given what I've seen so far, I expect there will be more minor issues when people try gdnsd on platforms other than those listed. Please send bug reports, I doubt the issues will be hard to solve at this point. And if you try a significant variation from the above and it works without a hitch, please let me know so I can add it to the list of tested systems.

Most current development/testing is done on latest available releases of MacOS X, Fedora Linux, and/or Ubuntu Linux.


Overall Design

Configuration

The configuration file's basic syntax is handled by vscf, which parses a simple and clean configuration syntax with arbitrary structural depth in the form of arrays and hashes. At one time this was a separate library, but it has been bundled back into gdnsd's distribution at this point. Details of the configuration options are in the man page gdnsd.config(5).

Threading

The gdnsd daemon uses pthreads to maximize performance and efficiency, but they don't contend with each other on locks at runtime, and no more than one thread writes to any shared memory location. Thread-local writable memory is malloc()'d within the writing thread and the address is private to the thread.

The "main" thread handles pretty much everything but the actual processing of DNS requests. It handles monitoring I/O in the case of plugins using the built-in monitoring, the http server that serves stats data over port 3506 (reconfigurable), and the signal handlers for daemon shutdown and toggling runtime logging of packet errors.

Currently all of the lengthy startup operations that precede serving DNS requests, such as the parsing of zonefiles and the post-processing of zone data, occur serially in the main thread as well. In the future we may spawn some short-lived worker threads to speed this up, but ultimately they would be joined at a synchronization point before spawning the threads that serve DNS requests in the long term.

There are N threads dedicated to handling UDP DNS requests, one per listening socket. These UDP threads do nothing but sit in a tight blocking loop on the listening socket until they are terminated. They run with all signals blocked, and they make no expensive system calls (other than necessary socket read/write traffic) or heap allocations at runtime. All necessary heap data/buffers are pre-allocated before these threads begin serving requests.

The scaling of UDP DNS was looked into extensively. Adding more threads per socket never works as the socket itself is a serial, locked resource for the operating system and our DNS code is way faster than the socket operations themselves.

To scale out over multiple cores, if you find that your network interface(s) can pump more data than gdnsd can handle in a single socket/thread, your best scaling option is to have gdnsd listen on several independent sockets (port numbers and/or IPs). Then either use a hardware loadbalancer or Linux's own ipvsadm software to balance the client requests across the several sockets. With ipvsadm you can do it all in software on the same machine gdnsd is running on, with the inbound port 53 traffic spread across N local ports gdnsd is listening on.

Few people will face these scaling limits in practice, and the few that do shouldn't have any problem figuring out ipvsadm, etc.

TCP DNS is threaded and scaled in the same manner. It could have been done differently, but in any sane DNS configuration UDP scaling issues should dwarf TCP ones anyways, and this keeps things simpler.

Each of the TCP DNS threads listens on one socket, and uses a libev event loop to handle many connections in parallel. They do allocate heap dynamically at runtime, and you can limit the count of parallel requests per thread with "tcp_clients_per_socket" (default 128). libev priorities are used to prioritize finishing current requests over accepting new ones when both events are available in the same loop wakeup, so this should help keep the parallelism number lower in general.

Statistics

The DNS threads keep reasonably detailed statistical counters of all of their activity. The core dns request handling code that both the TCP and UDP threads use tracks counters for all response types. Mostly these counters are named for the corresponding DNS response codes (RCODEs):

refused - Request was refused by the server because the server is not authoritative for the queried name.
nxdomain - Request was for a non-existant domainname. In other words, a name the daemon is authoritative for, but which does not exist in the database.
notimp - Requested service not implemented by this daemon, such as zone transfer requests.
badvers - Request had an EDNS OPT RR with a version higher than zero, which this daemon does not support (at the time of this writing, such a version doesn't even exist).
dropped - Request was so horribly malformed that we didn't even bother to respond (too short to contain a valid header, unparseable question section, QR (Query Response) bit set in a supposed question, TC bit set, illegal domainname encoding, etc, etc).
noerror - Request did not have any of the above problems.
v6 - Request was from an IPv6 client. This one isn't RCODE based, and is orthogonal to all other counts above.

The UDP thread(s) keep the following statistics at their own level of processing:

udp_reqs - Total count of UDP requests received and passed on to the core DNS request handling code (this is synthesized by summing all of the core stat counters (other than v6) above for the UDP threads).
udp_tc - Non-EDNS (traditional 512-byte) UDP responses that were truncated with the TC bit set.
udp_edns - Total requests which used EDNS (including bad version ones)
udp_edns_big - Subset of udp_edns where the response was greater than 512 bytes (in other words, EDNS actually did something for you)
udp_edns_tc - Subset of udp_edns where the response was truncated and the TC bit set, meaning that the client's specified edns buffer size was too small for the data requested. gdnsd's own output buffer size is autotuned for your data set, so that is never the limitation.
udp_sendfail - Count of UDP sendmsg() errors, which almost definitely resulted in dropped responses.

The TCP thread also counts this stuff:

tcp_reqs - Total count of TCP requests (again, synthesized)
tcp_recvfail - Count of abnormal failures in recv() on a DNS TCP socket.
tcp_recvsize - Count of TCP connections the daemon aborted early because the sender indicated they were going to send a much bigger request than we're willing to handle (question len >4K).
tcp_sendfail - Count of abnormal failures in send() on a DNS TCP socket.

These statistics are tracked in per-thread structures. The actual data slots are volatile unsigned longs, which helps with rollover on 64-bit machines. The main thread has pointers to read and aggregate the counters from all DNS threads for reporting - hence our portability constraints about "unsigned long" being atomic and cache-coherency. There's only one writer to any given stats variable, but another thread needs to be able to read the values sanely. (This isn't the only place we use assumed-atomic volatile unsigned longs, see the section about resource state monitoring later in this document).

The main thread reports the statistics in two different ways. The first is via syslog every log_stats seconds (default 3600), as well as always at exit time. The other is via an embedded HTTP server which listens by default on port 3506. The HTTP server can give the data in both html (for humans) and csv (for monitoring tools) formats. All of the stats reporting code is in statio.c.

Packet Error Logging

Many of the error conditions which are tracked by the statistical counters also have associated per-request syslog messages, which are disabled by default to avoid logfile spam attacks.

You can enable the packet error logging at runtime by either sending the daemon SIGUSR1, or letting gdnsd do that for you by executing:

   gdnsd -c /my/config/file lpe_on

Similarly, they can be turned back off with SIGUSR2 or the gdnsd command "lpe_off".

Truncation Handling and other related things

gdnsd's truncation handling follows the simplest valid set of truncation rules. That is: it drops whole RR sets (without setting the TC bit) in the case of being unable to fit all the desirable additional records into the Additional section, and in the case that Answer or Authority records don't fit, it returns an empty (other than perhaps an EDNS OPT RR) packet with the TC bit set. The space for the EDNS OPT RR is reserved from the start when applicable, so it will never be elided to make room for other additional records. Nameserver address RRs for delegation glue are considered part of the required set above (i.e. if they don't fit, the whole packet will be truncated w/ TC, even though they go in the Additional section).

Also, by default, uneccesary lists of NS records in the Authority section are left out completely (they're really only necessary for delegation responses). This behavior can be reversed (to always send appropriate NS records even when not strictly necessary) via the include_optional_ns option.


Zone file syntax

Zone file syntax generally follows the RFCs for standard zone files, which means they are mostly BIND-compatible. They do not support the BIND extension $GENERATE, and they do not require that the SOA record be the first record in the file.


DNS Database Validation

gdnsd is very picky about the set of zones it is willing to load. This is no accident. By requiring sanity in the database, the runtime code is made simpler because it has fewer ambiguous corner cases to deal with when trying to formulate a response from the database. During internal testing, it was not uncommon to see gdnsd rejecting the data that BIND was loading fine. Usually the reasons were pretty good. In one conversion it found a duplicate CNAME record, a backup MX record that pointed at a non-existant hostname in the same domain, and a bogus PTR record due to lacking a trailing dot.

This is a list of all of the important validations gdnsd does, as well as other related gotchas of the "Hey BIND did this, but it's broken in gdnsd" type:

All names (the left-hand-side of the records in the zonefiles) must be within the correct zone for the file they're in, no exceptions.

CNAME records are not allowed to co-exist with any other data at the same name. This is just basic DNS stuff, I think even BIND must enforce this.

When target names (hostnames on the right-hand-side of an NS, MX, SRV, or PTR record) are within the authority of the same gdnsd server (even in an "unrelated" zone), they will be required to exist and have an A record. An implication of this is you can't target any of these records at CNAMEs either. There's nothing we can do about targets outside of our authority of course.

Subzone delegation NS records where the NS target name is within a delegated subzone of the current zone itself must have glue addresses, in the form of A and/or AAAA records in the same zonefile. These glue records will never be considered part of the real zone data or seen by explicit queries. They will only be used to fill in NS glue in the additional section of the delegation responses.

NS glue for any other case (NS target name is in a completely independent zone served by us or an external nameserver) is not allowed, as no secure cache would believe us anyways.

Best practice on NS records is that the target name on the right-hand-side should be within the zone specified on the left-hand-side (or at the very least, served by the same authority) when possible and practical to do so. This advice is not gdnsd-specific.


Security

Any public-facing network daemon has to consider security issues. While the potential will always exist for gdnsd to contain stupid buffer overflow bugs and the like, I believe the code to be reasonable secure.

I regularly audit the code as best I can, both manually and with tools like valgrind, to look for stupid memory bugs. Another point in its favor is the fact that, being a purely authoritative server, gdnsd has no reason to believe anything anyone else on the network has to say about anything. This eliminates entire classes of attacks related to poisoning and the like. gdnsd never sends DNS queries (even indirectly via gethostbyname()) to anyone else. It's a DNS server, not a DNS client.

Perhaps more importantly, gdnsd doesn't trust itself to be root on your machine. Any time gdnsd is started as root, it will drop privileges to those of the user named gdnsd (configurable) and chroot into an empty, unwritable, root- owned directory (path configurable) before answering any network queries.

gdnsd will create the chroot dir and modify the permissions and ownership as necessary before dropping privileges if it can. If any security-related operation fails, the daemon will fail to start itself and abort with a log message indicating the problem.

While this doesn't erase other security concerns, it certainly helps in minimizing the potential impact of any future remote exploit that may be discovered in the gdnsd code.

If your host supports randomized address layout for executables (-fPIE in gcc terms), gdnsd can be built as such for additional security. This is not the default, but you can simply add it to your CFLAGS when building.


Plugin Architecture for Dynamic Address/CNAME Resolution

gdnsd supports building separate plugins as shared objects. These can be used to dynamically provide A, AAAA, and CNAME RR-set data to the backend of gdnsd. The plugin can be used to accomplish things like geographic loadbalancing and/or failover.

From a user point of view, there are two special RR types in gdnsd zonefiles, DYNA for addresses and DYNC for CNAMEs. The basic idea is that you do things like these in the zonefile:

www 300 DYNA plugin_name!resource_name admin 300 DYNC plugin_name!resource_name

The named plugin will get a request to give an address for the given resource name. The plugin has access to the DNS client's IP address, and can also modify the TTL specified in the original zonefile record for each response if it choses to do so. These RRs are used by the gdnsd core code anywhere a regular RR would be used (so for example, a SRV query that includes A-records as additional data would serve those A-records from the plugin if applicable).

The API and build process for plugins is documented separately in as gdnsd-plugin-api(3). The documentation isn't very complete yet, but looking at existing plugins should fill in any gaps.


Benchmarks

No respectably-well-defined benchmarks have been run on the new gdnsd code, but suffice it to say that UDP request handling is extremely efficient. The network stack seems to be the bottleneck for requests over the loopback interface on a single socket, at rates in the 50-80K/sec neighborhood depending on various local factors. Scaling over multiple sockets is nearly linear if you have the CPU cores and the network hardware for it. There is ongoing work to improve these numbers under Linux, but it will likely be a little while before appropriate kernels (and gdnsd versions that take advantage of them) are available.

The last set of benchmarks were done on 1.2.0, and havne't been updated yet for newer releases.


See Also

The manpages: gdnsd(8), gdnsd.config(5), gdnsd.zonefile(5), gdnsd-plugin-api(3), gdnsd-curvekey(1)


Copyright and License

Copyright (c) 2011 Brandon L Black <blblack@gmail.com>

This file is part of gdnsd.

gdnsd is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

gdnsd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with gdnsd. If not, see <http://www.gnu.org/licenses/>.