RELIABLE CODE Editor: Gerard J. Holzmann NASA/JPL [email protected] Points of Truth Gerard J. Holzmann YOU’VE NO DOUBT HEARD the saying, “A man with a watch knows what time it is. A man with two watches is never sure.” This is often called Segal’s law, which credits a certain Lee Segall (his name actually has two l’s) with making this observation sometime in 1961. Others have found a still earlier use in a 1930 newspaper article.1 If you’re an engineer, you probably wouldn’t be confused by having two watches. If both seemed to be working, you’d take the average of the two readings. If one stopped, you’d use the other, and if both stopped, you’d look for a third. But this would work only if you had both watches in your possession. It would be different if you had one watch and had to coordinate with someone else who had the second (unsynchronized) watch. In that case, you’d want both parties to use a single common reference for the time. The necessity of a single common reference is enshrined in a softwareengineering principle that’s so clear you need to hear it only once. Once you’ve heard it, you’ll forever suffer the frustration that you’re not quite able to live up to it. The SPOT, or Single Point of Truth, principle says that you should specify key pieces of information in one and only one place in your code. Any related information should be derived from that single source and not stored separately from it. The key information could be a data item, a procedure, or an interface defi nition. The benefit is that if you need to change something, you can change it at that one place and be sure that all derived information is updated as well. The principle is popular enough that it has several names. For instance, you can fi nd references to Single Source of Truth (SSOT), Single Version of Truth (SVOT), Don’t Repeat Yourself (DRY), and the more ferocious Duplication Is Evil (DIE). Spotting Bugs The dangers of not adhering to the SPOT principle are easy to show. I’ll start with a simple example that can bite, especially in larger code bases. 18 s4rel.indd 18 IEEE SOFTWARE | PUBLISHED BY THE IEEE COMPUTER SOCIETY 0740-7459/15/$31.00 © 2015 IEEE 6/4/15 1:38 PM RELIABLE CODE It can cause bizarre run-time errors that can be hard to pin down. Consider the following little C program, which I call spot1.c: #include <stdio.h> float a; extern void fct(void); int main(void) { a = 3.14; fct(); printf(“a=%f\n”, a); return 0; } The program uses one externally defined function, fct(), that will access the globally declared variable a. After the external call to fct() completes, the program simply prints the value of a. The function fct() is defi ned in the fi le spot2.c, which declares a as an external variable and accesses it as follows: extern int a; void fct(void) { a = 314/100; } Now you’ve noticed, mostly because there are just a few lines of code to look at here, that the second fi le got the type of a slightly wrong. Surely the compiler will catch this, right? Let’s fi nd out: $ gcc –Wall –pedantic –o spot spot1.c spot2.c $ ./spot a=0.000000 For good measure, we compiled this program with all warnings en- abled, and in pedantic mode so that we can also learn about the smallestpossible problems the compiler might detect. But the compiler issues no warnings. The problem is that the inconsistency in the type of a can be detected by only the linker, not the compiler itself, so it slips through, with potentially grave consequences. I’ve also tried several commercial static-source-code-analysis tools on As another example of how developers all too often ignore the SPOT principle, consider this code fragment: const float pi = 3.14159265358979; #define ten_pi 31.4159265358979 The fi rst line defi nes the constant variable pi and initializes it, up We programmers routinely violate single points of truth. this small example program, and none reported the coding error either. The real root of the problem is that I specified the type of a twice, violating the SPOT principle. One solution is to introduce a header fi le, say spot.h, and move the external declaration of a into it: #ifndef SPOT_H #define SPOT_H extern float a; #endif to some suitable level of precision. Then, possibly in a different fi le, or a few thousand lines away from this fi rst declaration, we might see a macro defi nition for a value that’s 10 times the value of pi. Clearly, we now have two sources of information for the value of the single numerical constant π. Don’t be surprised if the Boolean expression (ten_pi/10 == pi) This header fi le is now included at the start of both spot1.c and spot2.c, which lets the compiler catch the bug reliably. The variable is now declared in only one place in the program. The fi le spot1.c also contains the variable’s defi nition, which allocates memory for it, but in a place where the type of a in the defi nition can be checked against the type in the earlier external declaration. The external declaration acts as the single point of truth for the type of a. now yields the result false. A better solution would be to replace the macro defi nition with, for instance, #define ten_pi (10*pi) which restores the single point of truth. We programmers routinely violate single points of truth in many ways. If, for instance, you take the address of a variable or function, and in doing so make the object that’s pointed to accessible under a J U LY / A U G U S T 2 0 1 5 s4rel.indd 19 | I E E E S O F T WA R E 19 6/4/15 1:38 PM RELIABLE CODE different name than in the original declaration, you’ve indirectly broken the SPOT principle. It’s not the computer that gets confused in this case; it’s your peer who must maintain and possibly repair your code when problems show up, long after you’ve moved on. Truth in Documents Despite the SPOT principle’s appealing simplicity, it’s violated almost everywhere. Out of curiosity, I tried Truly duplicate information isn’t the real problem, though. Duplication merely sets a trap for a more vicious problem. The trap closes when one or more of the copies are updated, each with slightly different new information. Consider those three copies of your phone and address list that live on your work computer, your laptop, and the flash drive in your pocket. Most likely, these three fi les differ, each having been updated at separate points. Once you lose the single point of truth, restoring it can be exceedingly difficult. to check how much duplication there was among the fi les on my computer. That was easy enough. Using the md5sum tool, I computed 128-bit MD5 hashes of all regular fi les in my fi le system. I then sorted the output by hash value to detect duplicates. Shockingly, I discovered that over the years, I had accumulated close to a million fi les. What were they all for? I was also shocked to fi nd that approximately 85,000 of those fi les had at least one duplicate. Now, somewhat reassuringly, approximately 10,000 of those fi les matched only because they were empty or contained just a single line-feed character (it happens). Also, many of the other matching fi les were autogenerated by-products of various tools. More worrisome was that a few hundred of the duplicated fi les were larger than a megabyte. That small exercise was an easy way to reclaim disk space. 20 s4rel.indd 20 I E E E S O F T WA R E | How do you merge them back together? Detecting and resolving this type of near duplication is much harder. Once you lose the single point of truth, restoring it can be exceedingly difficult. The near-duplication problem occurs also in industry. In the design of large software systems, system engineers often document the original design requirements in spreadsheets. Those spreadsheets later fi nd their way into a requirements database, but the original spreadsheets are rarely discarded at that point as now obsolete duplicates. From the spreadsheet or database, yet more copies are created in software design documents, PowerPoint presentations, and test requirement documents. After enough time passes, nobody knows which version of a document is current or can trace back what changes were made in all the versions. Sometimes this means W W W. C O M P U T E R . O R G / S O F T W A R E | that if you want to know how a requirement is satisfied, it’s best to look in the code itself. This then means that, ultimately, the software developer and tester, not the system engineer, decide how to interpret a requirement. As you can imagine, this doesn’t always end well. Cloning Large organizations often reuse software modules from one project to another, frequently by cloning and modifying the earlier code. Defects in the original version don’t always reach the clones, and vice versa, because the new projects are typically under different chains of management and work from a different set of design documents. And yes, all those new design documents, more often than not, also start their lives as clones of earlier project documents that are modified along the way. What happens in all these cases is that an informal social process subtly interferes with a more rigorous engineering process. Many organizations have learned the hard way that reuse can’t be an afterthought. For software to become eligible for reuse, it should be designed and documented for reuse from the start. You’d never rewrite a library of trig functions or sorting routines without good reason. If the library is properly designed and mathematically rigorous, it shouldn’t need to be rewritten. If, nonetheless, a defect is discovered in a key library, any updates should be directed to just a single master copy of the code. A library that’s designed for reuse in this way can indeed form a reliable single point of truth across multiple projects. Interface Design I once heard someone say that the Unix system architecture is a good @ I E E E S O F T WA R E 6/4/15 1:38 PM RELIABLE CODE example of an object-oriented design. What that person probably meant was how Unix standardizes the interfaces to peripheral devices and defines the default interface between application programs. In both cases, the interface is that of a simple byte stream. Unix programs are often designed to read text input from the standard input channel and write text to the standard output channel. Because of this uniform interface definition, at least in the early days most Unix applications could read the output of any other application and produce output that other applications could process. The definition of a standard interface lets you build complex applications from small parts that can be connected without much effort. January/february 2013 Calling this approach object- oriented might be a misnomer, though. You could just as easily argue that it’s the opposite. By standardizing interfaces on the least common denominator in the system—text—the OS avoids having to define specialized interfaces for each different type of object that may be handled. The elegance of this standardization shows most clearly in the handling of peripheral devices, which are by default instrumented to produce or consume byte streams. This shields users from the details of the device interfaces and lets them think about each device as if it’s just a regular file. The details are “hidden” behind the interfaces. But the key idea isn’t data hiding either, as Dave Parnas described it.2 The key idea is the localization of data: creating single points of truth. References 1. B. Popik, The Big Apple, blog, 4 Nov. 2012; www.barrypopik.com/index.php /new_york_city/entry/a_man_with_one _watch_knows_what_time_it_is_a_man _with_two_is_never_sure. 2. D.L. Parnas, “On the Criteria to Be Used in Decomposing Systems into Modules,” Comm. ACM, vol. 15, no. 12, 1972, pp. 1053–1058. GERARD J. HOLZMANN works at the Jet Propulsion Laboratory on developing stronger methods for software analysis, code review, and testing. Contact him at [email protected]. Selected CS articles and columns are also available for free at http://ComputingNow.computer.org. www.computer.org/software marcH/aprIL 2013 www.computer.org/software IEEE Software offers pioneering ideas, expert analyses, and thoughtful insights for software professionals cyber Dumpster Diving // 9 the airbus a380’s cabin software // 21 programming with ghosts // 74 may/June 2013 www.computer.org/software who need to keep up with rapid technology from minecraft to minds // 11 Landing a spacecraft on mars // 83 Design patterns: magic or myth? // 87 change. It’s the authority on translating software theory into practice. storytelling for software professionals // 9 www.computer.org/ software/subscribe In Defense of Boring // 16 Beyond Data mining // 92 s4rel.indd 21 J U LY / A U G U S T 2 0 1 5 | I E E E S O F T WA R E 21 6/4/15 1:38 PM
© Copyright 2026 Paperzz