Technical ramblings: July 2005

Technical ramblings

Wednesday, July 27, 2005

Seven, plus or minus two.

One of my best friends in high school, a woman who went on and became a doctor, told me of a test given to patients to help assess their cognitive functioning. You give them a list of numbers, and have them recite them back. Most people, when told a list of numbers and asked to recite them back again remember around seven numbers, give or take a few.

I had forgotten this cognitive test of short-term memory function until I was talking to a co-worker who was talking about the principle of coding, where objects, methods, collections of interfaces or whatnot should be developed in groups of seven, plus or minus two.

The idea is simple. The human brain can really only remember and hold seven things, give or take about two things, in short term memory without significant stress. So, for example, seven numbers--which is why American telephone numbers have seven digits in them--or seven objects, or seven names, or seven modules... Seven, give or take two.

Violate this principle and you just cause short-term confusion.

In the realm of programming, this principle is very important. In our quest to simplify a problem, we should be willing to break things down into managable chunks. And by "managable", I mean chunks that are roughly around seven "things", give or take two. Now this isn't a hard and fast rule--like all good rules, it must on occassion be violated. And by "things" I don't even necessarly mean seven methods per class or seven class per package. But the methods within a larger class should conseptually be "clumped": thus, while the java.awt.Graphics class has dozens of methods, conceptually they can be clumped into five rectangle methods, five circle methods, etc.

And while a package may have more than seven classes in it, conceptually those classes should also be "related": thus, six interfaces, eight exceptions, etc. You can even have more than nine implementation classes, so long as they are conceptually related, such as five classes that do one thing, four that do another. (Of course if what they do are sufficiently unrelated, perhaps they belong in a sub-package.)

Functionality for a larger program can also be broken down into clumps as well. From the Macintosh Toolbox, we had the Windows toolbox, the Dialog toolbox, the Control toolbox and the QuickDraw toolbox--four things which, in the larger scope of the Macintosh API, were conceptually related under the heading "GUI managers." (Separate were the File/Resources managers, the Network/Serial IO managers, etc.) Then under each toolbox, you had "stuff that creates instances", "stuff that manipulates instances", "stuff that destroys instances"--and so forth.

I believe this is a major shortcoming of the Javadocs program, by the way: it fails to allow programmers to clump related methods together and create a sort of "meta-index" of methods.

The way each Macintosh Toolbox chapter was very illustrative: it would give a one or two page introduction outlining what the toolbox did. Then it would give five or six sections illustrating what you could do and how to do it, in general terms. ("How a window interacts." "Window lifecycle." "How to manipulate windows." "How to handle window events.") Then at the end was the API specification--but broken down into clumps of functionality: the methods for opening a window, then the methods for manipulating window properties, then methods for window events, etc.

Javadoc, however, doesn't provide the ability to create "meta-clumping" of functions. While obviously Javadoc should never replace proper documentation, at the very least it should be possible to have a comment that essentially says "the next eight methods are all related to 'Thing'", which then generates an index at the top of the Javadoc page that says "Thing methods" that is a link to the 'thing' methods.

So while a class with 50 methods may observe the "seven plus or minus two" rule with seven sets of seven methods (give or take), Javadoc has no way for a developer to quickly show what those groups are.

And that's a shame.

¶ 9:11 PM 0 Comments

I hate Java Beans.

Okay, so I really don't have Java Beans. But I do hate the abuse of this design model, just as I hate the abuse of any otherwise reasonable design models.

What? Don't know what I'm talking about? Okay, here goes, for those who aren't indoctrinated into the ~~cult~~ ways of Java programming--an art which, I'm honest to report, I'm not quite comfortable with.

See, in languages such as Visual Basic or Visual C++, you have this Microsoft-ian thing called a COM object, which is a way where objects can present a multitude of programmatic interfaces, and you can --at run time--discover what interfaces your object provides. So, for example, you can create a basic COM object which provides two interfaces: IInterfaceA and IInterfaceB, and call the object's QueryInterface() method to obtain a reference to the other interface. COM objects also support resource counting, so you can have multiple references to a single object, and when the last reference is released, the object is automatically deleted.

On top of this rather simple (and banial) concept, which predates the dynamic_cast<> operator on C++, Microsoft introduced the concept of the IDispatch interface. This allows a programmatic way not only to query which one of several set interfaces a COM object provides, but also to perform a form of "introspection" on C++ objects, so programmatically you can enumerate (via the IDispatch interface) the list of methods provided by an interface, and invoke them.

For Java programmers, I'm sure you'll think "who cares"--after all, what Microsoft does in thousands of lines of C++ code, Java programmers get for free in the java.lang.reflect package.

But not to be stopped in Microsoft's desire to ~~pile tons of crap on top of other crap until something useful comes out~~ innovate, they used COM objects with an IDispatch interface to create GUI components which can be invoked and run at design time, as well as at run time.

Why is this significant, you say? (And that's assuming you have the foggiest idea what the hell I'm talking about...)

Well, it means that some junior GUI engineer who doesn't know how to write a line of code can sit down in front of a GUI builder, and drag and drop components of an interface and hook them together. Each component provides a set of standard interfaces (such as IDispatch to discover what various object settings--such as color or font of a button--are available), and that set of standard interfaces allow a GUI builder to create an instance of, oh, say, a button, and hook it into an instance of, oh, say, a dialog box.

Volia! Instant interface; just drag and drop and hit the "save" button! What used to take in the late 1980's several weeks to do in the Macintosh Toolbox now takes an afternoon in Microsoft Windows. (And I should know; I wrote enough Macintosh Toolbox code in my day.)

Java gives you introspection pretty much for free, but the idea of building visual interface components and drag them together and get a working interface is a pretty cool idea. Especially because the first release of Java's AWT was even worse than the Macintosh toolbox to use. So Java introduced the concept of a Java Bean.

Now just because you can find a method doesn't mean you know what that method is, programmatically speaking. That is, just because you know that your class has a bunch of methods called "updateThing", "computeItem", "validateObject" means you have a clue what that method is supposed to do. So the Java Bean specification defines some standard method names so that a program inspecting your bean can know what's going on.

In other words, a Java Bean is just a class whose methods follow a certain naming convention. For example, according to the specification, if I have a property called "font" which takes the name of a font, then my Bean should provide two methods: 'String getFont()' and 'void setFont(String)'--the first is used to get the current font, the second used to set the font. A program which then loads your bean can then know your bean has this property called "Font" which takes a string--and even prompt you if you want to change the font.

So far, happiness. And if done right, you can then pop up a visual editor, drag and drop your beans together, and have a new application. And better yet, another user can come along, grab your beans, and build something new you never thought of.

So why do I hate beans?

The tradeoff of any sort of visual editing system is that you wind up adding additional complexity to the code. This is fine if your objects are well defined: I don't mind a little additional complexity if I'm creating (for example) a spreadsheet visual component and want someone else to be able to plug it into their program without pestering me too much.

But if you are not using this bean object within it's original context, well, it's just additional complexity that doesn't get you anything except extra reams of code that is almost impossible to maintain.

To give you a simple example, suppose I have an object 'adder' which adds two numbers. Intuitively, that method should be invoked by writing:

sum = adder.add(a,b);

However, if we were to rewrite this as a bean using the getter/setter method naming conventions, this is what we could potentially wind up with:

adder.setFirst(a);
adder.setSecond(b);
adder.sum();
sum = adder.getResult();

We've replaced one line of code, which is relatively easy to understand, with four lines of code which is a royal pain in the ass to follow.

Now if we are constructing our adder object to be used within a GUI builder or visual application builder IDE, then we may wish to build our adder this way: that way, we could (for example) drag a line from the "Read Input Field" field to our "First" input, a second line from the "Read Instrument Input" field to our "Second" input, and drag a line from our "Result" output to our "Display Meter" object.

But if we are not touching this thing in a Java Bean visual editor? All we've done is made a simple operation of adding two numbers four times more complex. Further, we've created a stateful bean object, and introduced the possibility of extra unnecessary complexity and chances for making the code impossible to maintain. So, for example:

adder.setFirst(a);
thing.process();
callThisRoutine();
System.out.println("Some log stuff");
callThatRoutine();
adder.setSecond(b);
thing.reprocess();
adder.sum();
System.out.println("Some more log stuff");
callAnotherRoutine(a);
doSomethingElse();
return adder.getResult();

You see what has happened? We've managed to completely bury the fact that this routine returns 'a+b' in endless meaningless and useless complexity.

And because the meaning is now buried, we've created code that someone who was new to the team would not want to touch with a ten foot pole; after all, they have no way of knowing what is meaningless complexity, and what is important complexity--that is, complexity that is driven by the needs of the project, rather than complexity that was thrown in out of laziness on the part of the programmer.

Yes, laziness: part of our job as programmers is supposed to be simplification of the problem (but not oversimplification) so we can both understand the problem, solve the problem in the simplest way possible, and write code that can be maintained in the future.

And our routine above, which returns 'a+b', is now impossible to maintain.

You don't think that happens in the real world?

Poor naive fool. Just apply for a job at Symantec and I'll be happy to show you real-world examples of this sort of nonesense.

¶ 7:15 PM 0 Comments

Wednesday, July 20, 2005

Macintosh on Intel Rumors

So we all should know by now that Apple is switching to Intel. There has been some speculation as to why Intel, with the latest speculations revoling around an upcoming move by Intel to video on demand over the Internet, perhaps to be branded (as Cringely did) as the "iTunes Movie Store."

Cringely brings up the notion that it's also possible that Apple may not intend to go forward with the changeover for their complete line of products. After all, in the high end arena, while Intel wipes the floor with the IBM G5 processor, for numeric computation the G5 rules Intel by a huge margin--so vendors who are interested in raw floating point computational power are going to get the short end of the stick.

There is some evidence for this possibility from Transitive Corporation, the people who created the QuickTransit technology that is at the heart of Apple's Rosetta, the technology that allows PowerPC code to run at 60% integer performance on an Intel processor. If we examine their web site's product map, we find that not only do they make QuickTransit for x86, but they also make QuickTransit for PowerPC, which allows PowerPC processors to run Intel code at around 60% integer performance.

This means if Apple decides to create a "split technology" lineup, with low-end systems running on Intel, and high-end workstations running on PowerPC, it would be a simple matter of licensing the technology from Transitive to allow Mac on Intel only software to run on PowerPC processors without having to build a "fat" application. And it would allow owners of older Macintosh systems to be able to do Mac on Intel development without having to upgrade their hardware, though I suspect Apple will more likely ask developers to fork out for a new, albeit inexpensive, low-end system.

¶ 9:51 AM 0 Comments

Thursday, July 14, 2005

Service Discovery in the Enterprise

So here's an interesting problem. Suppose you're working on a distributed software management tool, and you want to be able to know what component is where, so your interconnected components survive things like IP address changes with DHCP, and so you can make things easy for the user to install.

Service discovery would be really handy, don't you think? After all, the user wouldn't have to configure eight million things: they'd just need to drop the component (software, appliance, etc) onto the network, and it could reach out and figure out who else was there. You may need to have the installer take some sort of password or other credentials (to minimize hacking), but otherwise, just plug and play.

And with IP multicasting, a simple service discovery mechanism for your software is pretty much a snap. Of course you'll need to come up with some sort of "bridging" mechanism to bridge islands of components on various local LANs across your corporate WAN, but that's a matter of configuring--letting one LAN component know where another LAN component that is not "discoverable" is located. This is far easier than having to configure every God-damned tool.

And designing such a thing is fairly simple: first, go out and look up some of the existing protocols, such as the zeroconf protocol that Apple has been pushing. It's good to know what sorts of things people are doing. And what Apple is doing is creating a protocol where box 'A' says "who out there has 'foo'", to which boxes B and C say "I do." Of course for our purposes we may want to do something slightly different from Apple: we have specific problems we need to solve that may not quite work with Apple's protocol.

But there is something odd out in Enterprise software. For some reason or another, "ease of use" often still gets equated with "toy operating system"--even though service discovery is a lot more effort (especially with Enterprise, where there are additional security requirements) than simply prompting the user with a bunch of badly designed dialog boxes and having the poor user put all of his managed boxes on fixed IP addresses.

What is with that?

At times I have to wonder if Enterprise software developers fall into two groups: those who are working very hard to solve hard problems--such as building transaction processor engines which no-one but a handful of experts get. Then you have the folks who hook all this stuff together, to whom "automatic configuration" is somehow a "toy", yet whose idea of useable design involves giving the user a bunch of command-line tools because they couldn't think the design of their architecture.

Ease of use is not something you tack on; it's something you design bottom-up. But for some reason or another, "ease of use" is something most Enterprise people try to tack on at the end because they don't understand it at all. Because "ease of use" is for toy operating systems, not for "serious" people...

¶ 3:32 PM 0 Comments

Tuesday, July 12, 2005

A debate with a co-worker.

When I came to work at Symantec I did so to learn how to work on big teams. And I'm realizing one of the lessons to learn is how to deal with people whose opinions are--well, "stupid" may be too strong a word. Perhaps a better term is "people I disagree with."

One of these disagreements stemmed around a debate as to the proper use of unit test code during the build process. This co-worker and friend of mine asserted that unit test code (that is, software designed to run during the build process to automatically test the validity of libraries within the code) should always run every time things build, even during the "edit/compile/debug" cycle of development.

(For those who may not know: the biggest activity that software developers do is the "edit/compile/debug" cycle. During this cycle you write a little code, compile it, then test and debug it to make sure it works. You never write working code the first time, so you wind up finding the bugs, editing the code to fix those bugs, compiling, and debugging--in a near endless loop. When things work well, a typical developer during an 8 hour day could do this activity some forty or fifty times.)

His assertion was that because the unit test code debugs the entire product automatically, that it is in fact part of the debug process, and is essential. My assertion is that the developer should be able to turn this debugging process off, so the code can be more quickly compiled. The faster it is to compile the code, the more times the developer can go through the "edit/compile/debug" cycle, the more productive the developer is.

Now his point wouldn't be half bad if it only takes a few minutes to run the unit test code. But for our product, it takes over a fucking hour to run all the unit test code. And there is no way to turn it off.

Which means that even the most intelligent and brightest developer in the world can only go through the edit/compile/debug cycle at most six times a day, rather than forty or fifty times. Furthermore, the developer can no longer "experiment", trying different things--because each time you want to "try" something, you pay a penalty of over an hour waiting for all the test code to run. And that assumes all of these unit test programs run correctly--and unfortunately we have a few poorly written unit tests that are subject to race conditions and can occassionally and randomly fail.

Have you heard of the chinese method of water torture? Where a drop of water pelts the victim for hours at a time--drip, drip, drip?

Our development environment, because we have to wait an hour to test some code, is the programmer's equivalent of chinese water torture.

¶ 4:58 PM 0 Comments

About Me