User Commands NDIFF(1) NAME ndiff - compare putatively similar files, ignoring small numeric differences SYNOPSIS ndiff [ -? ] [ -abserr abserr ] [ -author ] [ -copyright ] [ -fields n1a-n1b,n2,n3a-n3b,... ] [ -help ] [ -logfile filename ] [ -minwidth nnn ] [ -outfile filename ] [ -precision number-of-bits ] [ -quick ] [ -quiet ] [ -relerr relerr ] [ -separators regexp ] [ -silent ] [ -version ] [ -www ] infile1 infile2 DESCRIPTION When a numerical program is run in multiple environments (operating systems, architectures, or compilers), assessing its consistency can be a difficult task for a human, since small differences in numerical output values are expected. Application of a file differencing utility, such as POSIX/UNIX diff(1), will generally produce voluminous out- put, often longer than the original files, and is thus not useful. The lesser-known UNIX spiff(1) utility, while capa- ble of handling numeric fields, suffers from excessively- long running times, and often terminates prematurely. ndiff provides a solution to this problem. It compares two files that are expected to be identical, or at least, numer- ically similar. It assumes that lines consist of whitespace-separated fields of numeric and non-numeric data. A hyphen (minus sign) can be used in place of either input filename to represent stdin, allowing one input stream to come from a UNIX pipe. This is a common, but by no means universal, idiom in UNIX software as a workaround for the regrettable lack of standard names for the default stdin and stdout streams. On some, but not all, UNIX systems, stdin can be named explicitly as /dev/stdin or /dev/fd/0. The default field separator characters can be modified with the -separators regexp command-line option, so that ndiff can also handle files with, e.g., parenthesized complex numbers, and comma-separated numbers from Fortran list- directed output. However, because line breaking and use of repeats counts in Fortran list-directed is implementation dependent, such files are not really suitable for cross- implementation file comparisons, unless the lists are kept short enough to fit on a single line. ndiff expects the files to contain the same number of lines; otherwise, a diagnostic will be issued. Unlike diff(1), this program cannot handle inserted or deleted lines. Version 2.00 Last change: 09 December 2000 1 User Commands NDIFF(1) Also unlike diff(1) (unless diff's -b and -w options are used), whitespace is not significant for ndiff, except that it normally separates fields. Lines that differ in at least one field (as determined by the absolute and/or relative tolerances, for numeric values, or string comparisons otherwise) are reported on stdout in a diff(1) -style listing of the form nnncnnn < line from infile1 --- field n absolute error x.xxe-xx relative error x.xxe-xx [nn*(machine epsilon)] > line from infile2 The first of these lines shows the line number twice, separated by the letter c (for change). The second and fourth lines begin with a two-character identifying prefix. The third, separator, line shows the field number at which the difference was found; fields beyond that one may also differ, but have not been checked. If the differing field is numeric, then the errors found are also shown on that line. If the relative error is not too big, its value is also shown as a multiple of the machine epsilon. ndiff recognizes the following patterns as valid numbers. In the patterns, # is a string of one or more decimal digits, optionally separated by a nonsignificant underscore (as in the Ada programming language), s is an optional + or - sign, and X is an exponent letter, one of D, d, E, e, Q, or q: s# s#s# s#Xs# s#. s#.s# s#.Xs# s#.# s#.#s# s#.#Xs# s.# s.#s# s.#Xs# The rigorous programming rule that determines whether a string is interpreted as a floating-point value is that it must match this very complicated regular expression (the line breaks are for readability only): "^[-+]?([0-9](_?[0-9])*([.]?([0-9](_?[0-9])*)*)?| [.][0-9](_?[0-9])*+) ([DdEeQq]?[-+]?[0-9](_?[0-9])*)?$" Thus, 123, -1q-27, .987d77, 3.14159_26535_89793_23846, and .456-123 are all valid numbers. Notably absent from this list are Fortran-style numbers with embedded blanks (blanks are not significant in Fortran, except in string constants). If your files contain such data, then you must convert them to standard form first, if you want ndiff to perform reliably. In the interests of interlanguage data exchange, most modern Fortran implementa- tions do not output floating-point numbers with embedded spaces, so you should rarely need such file conversions. Version 2.00 Last change: 09 December 2000 2 User Commands NDIFF(1) From version 2.00, ndiff also recognizes patterns for optionally-signed NaN (Not-a-Number): NaN SNaN QNaN NaNS NaNQ ?.0e+0 ??.0 +NaN +SNaN +QNaN +NaNS +NaNQ +?.0e+0 +??.0 -NaN -SNaN -QNaN -NaNS -NaNQ -?.0e+0 -??.0 and optionally-signed Infinity: Inf Infinity +.+0e+0 +.+0 +Inf +Infinity +.+0e+0 +.+0 -Inf -Infinity -.-0e+0 -.-0 Lettercase is not significant in these values. The rigorous programming rule for whether a field is a NaN or an Infinity is determined by these complex regular expressions (again, the line breaks are for readability only): "^[-+]?([QqSs]?[Nn][Aa][Nn][QqSs]?| [?]+[.][?0]+[DdEeQq][-+]?[0-9]+| [?]+[.][?0]+)$" "^(-[Ii][Nn][Ff]| -[Ii][Nn][Ff][Ii][Nn][Ii][Tt][Yy]| -+[.][-]0+[DdEeQq][-+]?[0-9]+| -+[.][-]0+)$" "^([+]?[Ii][Nn][Ff]| [+]?[Ii][Nn][Ff][Ii][Nn][Ii][Tt][Yy]| [+]+[.][-]0+[DdEeQq][-+]?[0-9]+| [+]+[.][-]0+)$" Even though in numerical computations, a NaN is never equal to anything, even itself, for ndiff, fields that match a NaN pattern are considered equal. Fields that match Infinity patterns are considered equal if they have the same sign. ndiff terminates with a success exit code (on UNIX, 0) if no differences (subject to the absolute and/or relative toler- ances) are found. Otherwise, it terminates with a failure exit code (on UNIX, 1). OPTIONS Command-line options may be abbreviated to a unique leading prefix, and letter case is ignored. To avoid confusion with options, if a filename begins with a hyphen, it must be disguised by a leading absolute or rela- tive directory path, e.g., /tmp/-foo.dat or ./-foo.dat. Version 2.00 Last change: 09 December 2000 3 User Commands NDIFF(1) GNU- and POSIX-style options of the form --name are also recognized: they begin with two option prefix characters. -? Display brief usage information on stderr and exit with a success status code before processing any input files. This is a synonym for -help. -abserr abserr Specify a maximum absolute difference per- mitted before fields are regarded as dif- ferent. Unless the fields are all of the same approximate magnitude, you probably do not want to use this option. A zero value for this option suppresses reports of absolute error differences. This option may be abbreviated -a. For readability, this option may also be called -absolute-error, or any unique pre- fix thereof. -author Show author information on stderr and exit with a success status code before process- ing any input files. -copyright Show copyright information on stderr and exit with a success status code before processing any input files. -fields n1a-n1b,n2,n3a-n3b,... By default, all fields are compared, but this option can specify a comma-separated list of numbers, and/or ranges, selecting the fields that are to be compared. Fields are numbered starting from 1. A field range is a pair of numbers, separated by one or more hyphens (minus signs): 4-7 and 4--7 are equivalent to 4,5,6,7. To prevent long range-expansion loops, field ranges are restricted to a non- negative span of no more than 100: 8-8 and 1-100 are acceptable, but 3-, -5, 8-7 and 1-101 all generate an error. -help Display brief usage information on stderr Version 2.00 Last change: 09 December 2000 4 User Commands NDIFF(1) and exit with a success status code before processing any input files. This is a synonym for -?. -logfile filename Redirect warning and error messages from stderr to the indicated filename. This option is provided for user convenience on poorly-designed operating systems (e.g., IBM PC DOS) that fail to provide for redirection of stderr to a specified file. This option can also be used for discard- ing messages, with, e.g., on UNIX systems, -logfile /dev/null. -minwidth nnn Specify a minimum field width required for numeric fields containing a decimal point and/or exponent. If both such fields being compared are shorter than this, they are treated as equal. This option is useful when fields contain relative error values given to only a few digits; such values might differ widely between two files, but those differences can be made irrelevant by invoking this option. For readability, this option may also be called -minimum-width, or any unique pre- fix thereof. -outfile filename Redirect output from stdout to the indi- cated filename. This option is provided for user convenience on operating systems that fail to provide for redirection of stdout to a specified file. -precision number-of-bits Specify the number of bits in the signifi- cands used in multiple-precision arith- metic. The corresponding number of decimal digits is floor( number-of-bits / lg 10) = floor(number-of-bits / 3.32). You can use the -version option to see the value of the corresponding machine epsilon (the smallest number, which, when added to one, still differs from one). The multiple-precision arithmetic library Version 2.00 Last change: 09 December 2000 5 User Commands NDIFF(1) used by ndiff increases its working preci- sion in multiples of a certain implementation-dependent size, usually 64 bits, so the reported machine epsilon may not decrease until number-of-bits has been increased beyond the next multiple of that size. If ndiff was compiled without support for multiple-precision arithmetic, use of this option will elicit a warning. -quick Suppress reading of the initialization files, $LIBDIR/.ndiffrc, $HOME/.ndiffrc, and ./.ndiffrc. LIBDIR represents the name of the ndiff installation directory; it is not a user-definable environment variable. Normally, the contents of those files, if they exist, are implicitly inserted at the beginning of the command line, with com- ments removed and newlines replaced by spaces. Thus, those files can contain any ndiff options defined in this documenta- tion, either one option, or option/value pair, per line, or with multiple options per line. Empty lines, and lines that begin with optional whitespace followed by a sharp (#) are comment lines that are discarded. If the initialization file contains backslashes, they must be doubled because the text is interpreted by the shell before ndiff sees it. -quiet The maximum absolute and relative errors, and their locations, in matching lines are tracked, and at termination, a two-line report with their values is normally printed on stdout. This option suppresses that report. This option may be abbreviated -qui, -qu, or -q. -relerr relerr Specify a maximum relative difference per- mitted before fields are regarded as dif- ferent. The relative error of two fields x and y is defined to be: Version 2.00 Last change: 09 December 2000 6 User Commands NDIFF(1) 0 if x is identical to y, or else abs(x-y)/min(abs(x),abs(y)) if x and y are nonzero, or else 1 if x is zero, and y is nonzero, or else 1 if y is zero, and x is nonzero, or else 0 since both x and y are zero. This complex definition of relative error ensures that the results will be indepen- dent of the order of the two input files on the command line. A zero value for this option suppresses reports of relative error differences. For readability, this option may also be called -relative-error, or any unique pre- fix thereof. If neither -abserr nor -relerr is speci- fied, then -relerr x is assumed, where x is the larger of 1.0e-15 and eight times the machine epsilon (the smallest number whose sum with 1.0 still differs from 1.0). If the specified relative error value is greater than or equal to 1.0, it is multi- plied by the machine epsilon. Thus, you can specify -relerr 16 to allow relative errors of up to 4 bits (since 2^4 == 16). ndiff will issue a warning if you specify a relative error value smaller than the machine epsilon, but will accept and use Version 2.00 Last change: 09 December 2000 7 User Commands NDIFF(1) your specified value. -separators regexp The argument is an awk(1) regular expres- sion that specifies an alternate set of characters separating fields in input lines. By default, this is a single blank, which has a special meaning in awk(1): leading and trailing whitespace (blanks and tabs) is first stripped, then runs of consecu- tive whitespace are collapsed to a single space, and finally, the line is split into fields at the spaces. If the input files contain parenthesized complex numbers, or comma-separated numbers from Fortran list-directed output, then you should specify -separators '[ \t,()]' so that blanks, tabs, commas, and parentheses separate input fields. -silent Suppress the output of the difference lines on stdout. Using both -quiet and -silent guarantees that nothing is printed on stdout, but the ndiff exit code can still be used for testing for a successful comparison. This option may be abbreviated -s. -version Show version and precision information on stderr and exit with a success status code before processing any input files. The machine epsilon reported in this out- put may depend on a preceding -precision number-of-bits specification. -www Show the World-Wide Web master archive location for this program on stderr and exit with a success status code before processing any input files. CAVEATS This implementation of ndiff can be built with support for double-precision, quadruple-precision, or multiple-precision arithmetic. The -version option reports the particular choice at your site. Thus, ndiff will not correctly handle absolute and relative error tolerances that are smaller than Version 2.00 Last change: 09 December 2000 8 User Commands NDIFF(1) those corresponding to the machine epsilon in the arithmetic for which it was built, and for that reason, installers are encouraged to build the multiple-precision version, so that users can select any required precision. WISH LIST It would be nice to have ndiff's abilities incorporated into the GNU diff(1) program; that way, numeric fields could be successfully compared even in files with inserted or deleted lines, and much of the entire computing world could benefit. Perhaps some community-minded and clever reader of this documentation will take up this challenge, and present the Free Software Foundation with an improved diff(1) implemen- tation that offers support for tolerant differencing of numeric files, using ndiff as a design model, sample imple- mentation, and testbed! Ideally, such an improved diff(1) implementation should han- dle numbers of arbitrary precision, allowing comparisons of numeric output from systems that support high-precision arithmetic, such as Lisp and symbolic algebra languages. In addition, it might choose to do its arithmetic in decimal floating-point, so as to avoid inaccuracies introduced by vendor-dependent libraries for decimal-to-native-base number conversion. The awk(1) prototype version of ndiff supports only double- precision arithmetic; the C version is more flexible. FILES In the following, LIBDIR represents the name of the ndiff installation directory; it is not a user-definable environ- ment variable. If ndiff has been installed properly at your site, the value of LIBDIR is /usr/local/share/lib/ndiff/ndiff-2.00 $LIBDIR/.ndiffrc System-specific initialization file con- taining customized ndiff command-line options. $HOME/.ndiffrc User-specific initialization file con- taining customized ndiff command-line options. ./.ndiffrc Current-directory-specific initialization file containing customized ndiff command-line options. $LIBDIR/ndiff.awk awk(1) program invoked by ndiff. This file will not be installed if the C ver- sion of ndiff was built. Version 2.00 Last change: 09 December 2000 9 User Commands NDIFF(1) SEE ALSO awk(1), bawk(1), cmp(1), diff(1), gawk(1), mawk(1), nawk(1), spiff(1). AUTHOR Nelson H. F. Beebe Center for Scientific Computing University of Utah Department of Mathematics, 322 INSCC 155 S 1400 E RM 233 Salt Lake City, UT 84112-0090 USA Email: beebe@math.utah.edu, beebe@acm.org, beebe@computer.org, beebe@ieee.org (Internet) WWW URL: http://www.math.utah.edu/~beebe Telephone: +1 801 581 5254 FAX: +1 801 585 1640, +1 801 581 4148 AVAILABILITY ndiff is freely available; its master distribution can be found at ftp://ftp.math.utah.edu/pub/misc/ http://www.math.utah.edu/pub/misc/ in the file ndiff-x.yy.tar.gz where x.yy is the current ver- sion. Other distribution formats are usually available at the same location. That site is mirrored to several other Internet archives, so you may also be able to find it elsewhere on the Internet; try searching for the string ndiff at one or more of the popular Web search sites, such as http://altavista.digital.com/ http://search.microsoft.com/us/default.asp http://www.dejanews.com/ http://www.dogpile.com/index.html http://www.euroseek.net/page?ifl=uk http://www.excite.com/ http://www.go2net.com/search.html http://www.google.com/ http://www.hotbot.com/ http://www.infoseek.com/ http://www.inktomi.com/ http://www.lycos.com/ http://www.northernlight.com/ http://www.snap.com/ http://www.stpt.com/ http://www.yahoo.com/ Version 2.00 Last change: 09 December 2000 10 User Commands NDIFF(1) COPYRIGHT ######################################################################## ######################################################################## ######################################################################## ### ### ### ndiff: compare putatively similar files, ignoring small numeric ### ### differences ### ### ### ### Copyright (C) 2000 Nelson H. F. Beebe ### ### ### ### This program is covered by the GNU General Public License (GPL), ### ### version 2 or later, available as the file COPYING in the program ### ### source distribution, and on the Internet at ### ### ### ### ftp://ftp.gnu.org/gnu/GPL ### ### ### ### http://www.gnu.org/copyleft/gpl.html ### ### ### ### This program is free software; you can redistribute it and/or ### ### modify it under the terms of the GNU General Public License as ### ### published by the Free Software Foundation; either version 2 of ### ### the License, or (at your option) any later version. ### ### ### ### This program is distributed in the hope that it will be useful, ### ### but WITHOUT ANY WARRANTY; without even the implied warranty of ### ### MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ### ### GNU General Public License for more details. ### ### ### ### You should have received a copy of the GNU General Public ### ### License along with this program; if not, write to the Free ### ### Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, ### ### MA 02111-1307 USA. ### ######################################################################## ######################################################################## ######################################################################## Version 2.00 Last change: 09 December 2000 11