09 September 2009

Welcome Back to America: Door-to-Door Christians

So, I've been back in the country for a week now. In some ways it's really nice to be here, and in others it's driving me crazy. In particular, I've have two visits from door-to-door Christians in as many days.

The first visit came during the daytime, and I tried to politely refuse. I even wished them a good afternoon.

But tonight, they started banging on my door at 9pm, while I was in bed. I tried to ignore them, but they banged again, making me think that my neighbors had some emergency. So, I got up, opened the door, and a well-dressed young man proudly announced "Hi, we're Christians!"

I slammed the door in their face.

And although, in this second situation, I think that the solicitors were behaving obnoxiously, I still feel a need to explain why. Certainly, they thought they were doing something good. Let me express an alternate interpretation of the interaction. When you come to my house and try to tell me about your religion, here is what I hear:
"Hello, we don't really care what you believe because we are right and you are wrong. Although popular culture has presented ample evidence that many people do not appreciate our presence, and although we should respect that you may have a different opinion, we chose to ignore what you believe and what you want, and to instead impose our will on you."
No matter how you sugar-coat it, it's as rude a statement as is possible. I shouldn't have to post "no soliciting" to escape this.

Just be glad I don't knock on your door to tell you how you're wrong.

24 August 2009

Excellent Quote: Chomsky on English Spelling

conventional orthography is ... a near optimal system for the lexical representation of English words.
Chomsky, N. & Halle, M. (1968) The Sound Pattern of English

Compare to Mark Twain's "Spelling Reform: A Plan for the Improvement of English Spelling,"

Fainali, xen, aafte sam 20 iers ov orxogrefkl riform, wi wud hev a lojikl, kohirnt speling in ius xrewawt xe Ingliy-spiking werld.

22 August 2009

One of my few Indian Victories: Cheap Tech Books.

One of the most respected algorithm text books out there, Cormen et al's Introduction to Algorithms will cost you $62 from Amazon (2nd Ed, paperback). On the other hand, I just bought it here for Rs 350 in India, brand new. That's about $7 US. Wonderful. I may have to go back for TAOCP 1-3.

12 August 2009

Excellent Quote: How programmers think

I can't tell you how correct this passage is:

Most programmers above a certain level are very good at conceptualising large, complex systems. They can interrogate perceived weaknesses in a program before it is even written. This is how good programmers manage to write programs that are largely free of defects and that generally work in the way that is expected. Small-scale errors aside, a good high-level conceptual understanding of the system, coupled with an ability to mentally simulate the behaviour of any portion of a program provides programmers with most of the tools they need to construct good-quality programs. The programmer is likely to know (at least at some high level) how the entire system is going to work any any given moment.

From Adventures in Programming Languages and Semantics, "If concurrency is easy, you're either absurdly smart or you're kidding yourself." The full article is a good read.

04 August 2009

Review and Analysis of C# Part 1: Partial Definitions are an Anti-pattern

I've had to use C# a lot at work recently. Like everything else, I have very strong opinions about C#. So, I've decided to write an N-part series reviewing, critisicing, and sometimes even praising its design.

So, here's the first installment: Partial Definitions are an Anti-Pattern.

C# introduces the partial keyword. In short, the partial keyword allows a programmer to partially define a class at several places. According to MSDN, the rationale for this feature is:
  • When working on large projects, spreading a class over separate files enables multiple programmers to work on it at the same time.

  • When working with automatically generated source, code can be added to the class without having to recreate the source file. Visual Studio uses this approach when it creates Windows Forms, Web service wrapper code, and so on. You can create code that uses these classes without having to modify the file created by Visual Studio.

The first argument, holds no water. Yes, projects sometimes grow large, and often multiple programmers need to work on it concurrently. But, this problem is better handled by version control software.

To make this argument, let me introduce the notion of a section of code. Let's say that a section a collection of lines of code such that a significant modification to any one requires modification to (or at least a critical evaluation of) the other lines within that section. For example, changing the type of a variable requires you to revisit each use of that variable. For example, changing an invariant of a class requires you to revisit all lines which assume that invariant. It's a loose definition, but all the programmers out there know what I'm talking about.

Now suppose that two or more programmers are trying to modify some large class. At any given moment, either they work on the same sections of code, or they do not. If they work on different sections (which implies they are working on different lines), then common version control software will be able to automatically combine their efforts. On the other hand, if these two programmers are working on the same section of code, the programmers will need to coordinate their efforts---even if they are able to isolate their changes to disjoint lines of code or separate files. Said another way, the ability to split the class into two separate files does nothing to solve the original problem.

The second argument is no stronger. Suppose a code generator produces a large class definition. Further suppose that, across multiple runs of the code generator, much of the output is common. Why then, I ask, would the code generator repeatedly emit redundant code? Wouldn't it make more sense for the code generator to place the common code into a runtime library, or even a base class from which the situation specifics may derive? Just because it is a code generator, there is no excuse to emit code which ignores good software engineering practice.

There may be efficiency concerns pushing towards this style of code generation output. It could be that there is a significant difference in run time. I argue that this is a symptom of a problem elsewhere in the software stack. If two different implementations differ only on a cosmetic level, but the compiler exhibits a drastic performance difference, then it is the compiler's responsibility to decrease that performance difference. In a high level language---and especially in the case of a language stack as complicated as the CLR---these kinds of implementation details should be below the programmer's cognitive radar.

I have argued against the claimed benefits of partial classes, and I'll be happy to argue against any others that people suggest. Let me next argue that partial classes not only give no benefit, but that they also do harm.

The primary goal of software engineering is to create software engineering best practices in which programmers can reason about software in a modular fashion. Classes, for instance, enable us to think about one algorithm at a time, and to ignore all of the other parts of the software. Imagine trying to prove that your implementation of a binary tree is correct if it depends upon the operation of your GUI? Yeah, that's ridiculous, and that's why we write modular programs.

Now suppose that you have written a module so complicated---so multi-faceted---that you believe that some parts of its definition should be expressed separately from the rest. This suggests you believe (1) that your module expresses at least two different aspects of the problem, (2) that a programmer can reason about each independently, and (3) that any attempt to reason about them as a single unit will lead to unnecessary confusion. All of these are reasons to place the second aspect of your module into a separate module, not a separate file. In fact, many design patterns describe interfaces which tackle precisely this sort of fine-grained interaction among modules. The only reason to keep them in one module is programmer laziness. In this case, the partial feature enables the programmer to defer good design; he may get the code to work faster, but may cause headaches for all of the programmers who must maintain the code.

Additionally, as a programmer, I find it valuable to see the entire definition of an algorithm (feel free to disagree; let me know why in the comments). I don't necessarily look at the entire definition, but I think it is critical to be able to find any code which contradicts my understanding of code. In this case, a single definition (or at least a single file) enables me to quickly gather all factors which contribute to the operation of an algorithm. From my experience with C#, partial class definitions make it harder to gather all factors, and make it difficult to be sure I have found all of them.

Virtual methods might cause this problem as well. However, virtual methods are more restricted. First, they are marked virtual. Second, I know immediately where to find refinements of virtual methods: lower in the inheritance tree. Partial definitions have an arbitrary number of components---they can be anywhere provided they are combined at compile time.

Some may counter that these arguments all suggest a feature-rich integrated development environment. Sure. I'm in favor of good tools. But (1) I appreciate languages which don't require IDE support to read, because sometimes you read code on a website, a book, or other media, and (2) even the best of the best tools don't yet support this style of code browsing. I've been using Visual Studio 2008 at work, and it can't even list all of the subclasses of a base class for me! Enumerating all parts of a partial definition is equally challenging.

My recommendation: avoid the partial keyword, and instead decompose your software into modules sensibly.

08 June 2009

The Foreign Registration office in Bangalore

As a foreigner, the government of India requires that I register with the police within a fortnight of my arrival in India. I have no problem with this conceptually; I believe in India's sovereignty, and I think it is within their rights. However, the implementation of this policy leaves sucks.

First of all, I admit that I don't speak Kannada, and I don't speak Hindi (or Tamil, Telugu, Assamese, or any of the twenty some official languages of India), but in my defense, I strongly doubt that the cumulative language coverage of the officers would accommodate every citizen of India, and thus the difficult of communications issue is mute. Let me also qualify this by saying that my company has a liason team, whose mission is to try to make this process much easier for me.

Nonetheless, I had to visit the foreign registration office (FRO) six times over the last week, and had to pay a $30 fee for late registration.

(Visit 1) I arrive at the office, sign in, and am handed a bundle of forms. I don't know why my liason team doesn't keep these forms on file.
(Visit 2) Because of several work engagements, I couldn't start the process until 4:30pm. The liason and I drive across town to a notary, and then back to the FRO office. When we get within a block, he tells me to get out and run. I walk in a 5:25pm, but the clerk swears it's 5:30, that there's nothing I can do, and that he doesn't care that I have to pay a $30 late fee.
(Visit 3) I come back the next day, but I forgot my passport. D'oh
(Visit 4) Later that day, they review my paperwork, highlighting several mundane details. They hand me the signed form and tell me to go to the other counter. There, they tell me to sit and wait. Thirty minutes later, I ask why I'm waiting and he tells me to go back to the first counter. There, they take my forms, hand me a receipt, and tell me to come the next evening.
(Visit 5) I arrive the next evening, and hand them my receipts. They tell me to go to the other counter, and the second counter tells me to wait. Twenty minutes later, I ask why I'm waiting, and they tell me to go back to the first counter. There, the man gives me a deposit slip, and asks me to take it to the bank down the street, where I must deposit $30. When I return to the FRO, they give me a receipt and ask me to come back Monday.
(Visit 6) I arrive, sign-in, and deposit my receipt. He tells me to go to the other counter and ask for Ajita (sp?). I do so, he looks at it for 30 minutes, then hands it back and tells me to go to the other counter. Finally, I am a legal resident.

The other interns have similarly horrifying stories; some were given business visas instead of employment visas; others were given student visas. These people have a lot more work to do before they can be legally paid within India.

I'll leave it as an exercise to the reader to replace the FRO office with 100 lines of code in the language of your choice.

20 May 2009

I'm off to India

I leave for India tonight! I'll be back in September.

19 May 2009

Power Steering Fluid Pump, Mercedes-Benz 190E (W201)

The girlfriend and I spent an hour trying to find the power steering fluid reservior on a Mercedez-Benz 190E (a.k.a. W201). Even the internet didn't know. We found it again, in hopes that the next guy finds it easy.

Power Steering Pump; Mercedes-Benz 190E (W201)

This is the power steering pump in a Mercedes-Benz 190E (a.k.a W201). The top of the power steering pump is the power steering reservoir.

Simply unscrew the black nut on top, and remove the lid. You should extract the old fluid using something like a turkey baster (which, of course, you should never use on food again). Then fill it mostly to the top (owner's manual says 0.6L).

Good Quote: Le Corbusier

... Man's stock of tools marks out the stages of civilization, the stone age, the bronze age, the iron age. Tools are the result of successive improvement; the effort of all generations is embodied in them. The too is the direct and immediate expression of progress; it gives man essential assistance and essential freedom also. We throw the out-of-date tool on the scrap-heap: the carbine, the culverin, the growler and the old locomotive. The action is a manifestation of health, of moral health, of morale also; it is not right that we should produce bad things because of a bad tool; nor is it right that we should waste our energy, our health and our courage because of a bad tool; it must be thrown away and replaced.

Le Corbusier
in "Towards a New Architecture"
Translated by Frederick Etchells

28 April 2009

Cute typo: Turing Basin Park

View Larger Map

This is supposed to be "Turning Basin Park," but apparently some geek over at google thought "Turing" would be more appropriate, given the proximity to Princeton.

24 March 2009

walk score -- measuring pedestrian friendlyness

As I've mentioned before, I don't have a car, don't want one, and don't think that people should have an unquestioned right to drive a car. I'm an urbanist, and I believe that people should be able to walk to all of their life's necessities, or in the worst case, ride a bike or public transportation. Since I've moved to Princeton, one of my major complaints has been how inconvenient it is for someone without a car.

Today, I discovered WalkScore (via the Freakonomics Blog). It is a google maps mash-up which tries to quantify how walkable an area is according to distance from common necessities, such as grocery stores and bars. Very clever. And for comparison:

  1. My old neighborhood in Charlottesville: 85/100 - very walkable
  2. My old neighborhood in Brooklyn: 88/100 - very walkable
  3. My current neighborhood in Princeton: 29/100 - car-dependant
  4. Washington Square Park, NYC: 100/100 - walker's paradise.
It's nice to have some external validation on my opinion of Princeton ;)

In all fairness to Princeton, if I were to live on Witherspoon at Spring St, I'd live in Walker's Paradise 95/100, but what grad student can afford that?

05 March 2009

Bjarne Stroustrup is so metal!

Bjarne Stroustrup is so metal!On Wednesday, Bjarne Stroustrup made a visit to Princeton, and the Liberty Research Group scheduled a private meeting with him before his colloquium. We spent a little over an hour discussing the C++0x and the future of C++. Overall, Bjarne is a cool guy, and I had an excellent time talking with him.

The wikipedia article about C++0x is painfully out of date. Bjarne mentioned some upcoming features that are hardly mentioned, if at all. Of these, the ones which took my interest the most were Concepts, Axioms, and the future of static typing in C++.

I'll disclaim that I haven't read any C++0x standardization documents, not even drafts, so take everything here as a grain of salt. These are simply my take-away points from the talk.

Bjarne is concerned that the template feature of C++ does not allow programmers to sufficiently constrain template parameters. For instance, a sort algorithm may be implemented to operate on a random-access iterator. However, recall that iterators in the C++ STL do not share a common inheritance (and there are fair arguments that they shouldn't). C++ templates cannot capture the notion that one of its formals must be a iterator-like type, and so specializing the template against, say a string, will likely yield hundreds of uninformative type errors instead of a simple error stating that we expect an iterator.

One solution to this problem would be to create base iterator classes in order to ensure an interface (this is the approach taken by Java, for instance). However, one could argue that this is an inappropriate use of inheritance---it's not code re-use, and although we give many concepts the name iterator, they really have little in common.

Some languages use a notion of structural equivalence (or, less formally, duck typing) to capture these constraints. I think this is the appropriate technique. The earlier example of a sorting algorithm simply requires some object which can provide random access to a data structure, and the most direct test for compatibility is whether the interface is a satisfied. In C++0x, Concepts are an implementation of structural equivalence (from the high-level, language view). In the low-level view, concepts are a special case of templating, and produce multiple implementations of the same code according to the specific type.

In a C++0x Concept, one can specify the methods which must be supported. For instance, one may have a Concept named TotalOrder, which must support a less-than-or-equal-to operator. Many types satisfy the TotalOrder concept, such as int, double, std::string, yet they are not required to have a common ancestor. In this sense, a concept allows you to communicate many of the traits to a type, without paying the penalty of multiple inheritance.

But Concepts get cooler. Although I describe them as a mechanism for structural equivalence in C++, they can also be viewed as mixins, since they allow a programmer to specify a default implementation for many methods. With the prior example of TotalOrder, one could also add default implementations of greater-than, equal-to, etc, operators as functions of less-than-or-equal-to. In this way, it becomes very easy for programmers to bootstrap support for all operators while writing minimal code.

Suppose that I implement some class that satisfies TotalOrder. This information simply means that my class supports comparison operators. One wonders, however, if these operations actually behave as a total order. As a counter example, I could write a random number generator to produce the result of the less-than-or-equal-to comparison---can a language be made to disallow this? Or, can a compiler be taught to rely upon deeper qualities of total orders, without having a special case for total orders?

Axioms allow the programmer to specify additional qualities of operations that transcend simple typing. For instance, one could write axioms that signify that the less-than-or-equal-to relation is transitive, or that it is always the logical inverse of the greater-than relation. With axioms, we can state that a sort algorithm not only requires a random-access-iterator, but also that there is a total order of the elements which satisfies transitivity. The use of concepts and axioms allows us to specify all of the constraints of the sort algorithm.

But how can the compiler use axioms? It is unlikely that any compiler will ever be able to verify that the listed axioms are true for a given implementation. Instead, axioms will provide additional transformations to the compiler's arsenal of optimizations. With support for axioms, many compiler optimizations including constant-folding (especially with associativity), reduction in strength, and common-subexpression elimination can be lifted from the compiler and expressed at the language level. I think this is really cool.

I think the notion of concepts and axioms struck home with me because they resemble some of my thinking on one of my side projects. I've been working on a language for years now (never enough time, alas!) in which I plan to feature structural equivalence in typing for reasons similar to those outlined by Bjarne. The axioms are a natural extension which I now realize I would like to support as well.

Finally, before the meeting was over, I had to ask a question about type safety. Throughout his book The Design and Evolution of C++, Bjarne laments the type (un)safety of C. Repeatedly, and with each new language feature, he attempts to make C++ more typesafe, but is repeatedly foiled by inherited features such as void pointers and unrestricted casting. So, I asked: are you ever going to take the plunge and drop support for pointers in C++? His answer was intelligent, and showed a great understanding of not simply the technical considerations of C++, but also the political considerations.

Mr. Stroustrup suggested that he hopes to add more and more features to C++ until type-safe alternatives are available for each and every unsafe aspect of C++. Once that has happened, he hopes to define a safe subset of C++. At that point, all existing C++ tools could be given a flag to provide warnings of unsafe features. This safe sub-C++ will immediately have rich toolchain support, but will be able to provide strong guarantees, such as those provided by ML. Excellent plan.

Bjarne is a very interesting person indeed; I recommend you talk with him if you ever have the chance.

01 March 2009

A quick thought...

The other day, Daya and I rode our bikes to the local grocer to buy some groceries. At the checkout line, we stuffed our groceries into our bags, when the teller asked, "Did you guys ride your bikes here?"

"Yes," I responded, "How else does one buy groceries?"

She deducted a dollar from our total, and told us the store offers a ride your bike discount. The guy in line behind us muttered something to the effect of "I'm glad you're doing your part."

It's at times like these that I want to say "No! I'm not being a responsible person; rather, you are being irresponsible!" Honestly, I think they should add a fee for driving to the grocer, perhaps charge for parking, or make you wear an "I hate the environment" shirt.

But, I bit my tongue.

On an unrelated note, Bjarne Stroustrup is visiting Princeton CS on the 4th for a 4:15 colloquium. If you're in the area, you should check it out. It will be packed.

06 February 2009

Failed reasoning by an Ivy League Grad Student; Engineering and Approximation

Let me babble for a little bit...

There was an extended power failure in Princeton last night---for two hours, all three power inlets to the boro went down, and so all the uninterruptible power supplies drove themselves dead. While the CS department's servers were hooked up to a secondary generator, the servers for my research group went down.

After power was restored, our machines booted back up, but in the incorrect order. One of the other students in the group had to go visit them to remount NFS partitions, make sure they all came back up, etc, etc.

While he was down there, he observed a chirping noise. Looking around underneath tables and racks, he found the culprit: an uninterruptible power supply that wasn't plugged into the wall, and had nothing plugged into it. Judging by its position, he decided that this was a power supply for a server that had been of commission for at least a decade. He reasoned that it took about ten years for the minute inefficiencies in this UPS to drain the battery.

But, the damned thing was still chirping. Looking around, he couldn't find a free plug, and so he couldn't restore power to the UPS. "How can I make this stop chirping?" he asked himself.

His solution amazes me: he plugged the UPS into itself.

Now, clearly, the UPS isn't going to charge, and he doesn't dispute that. He argued, however, that this should be enough to fool the UPS into thinking that it was charging. In other words, he conjectured that the UPS chirps iff it does not have wall power; he was wrong, because he assumed the UPS engineers had made an approximation they had not. In actuality, the UPS continued to chirp. This is because the UPS chirps iff there is a negative change in the charge of the battery. It takes some time to measure this change, and this is why UPSes will chirp for a short while after you plug them back in.

Why do I bring this up? Because it allows for an interesting analysis of approximation, and maybe we can pull out some principles of design. What semantics does the chirp noise carry? The chirp noise should denote battery discharge. Under the assumption that the UPS can always charge the battery as fast as it is discharged when plugged in, the UPS would stop chirping when plugged in. The question is: do you want to measure the real scenario, or can you get away with measuring symptoms of the scenario?

Engineering is the fine art of approximation. Approximation is so ingrained into our reasoning that we have mathematical operators to describe it: x << y means x is significantly smaller than y; x -> y means that as the value of x comes arbitrarily close to y; epsilon denotes an arbitrarily small number; we say a function f is O(g(x)) to mean that as x gets arbitrarily large f(x) is bounded by some constant times g(x). The meanings of significant, arbitrary, and some vary wildly by situation.

Engineers make jokes about it too: a horse is a sphere if it makes the numbers easier; 2+2==5 for very large values of 2. Approximation is critical, but only where appropriate.

In the UPS example, the charge of the battery can be measured more or less directly. In many other circumstances, the real scenario is hard to measure, but symptoms of it are easily measured, and an approximation can be computed from those measurements. A couple examples come to mind:

  • You use a compass to find North, but it actually tells you the direction to something that is mostly north of you and very far away. Sailors have shown that for navigation within a few hundred miles, this is perfectly sufficient.
  • It is difficult for a robot to measure its exact position in a room, but it can easily measure how much each of its wheels has moved and then deduce its relative position via dead reckoning. For short distances, the error of this approach is negligible.
  • People say you should change your oil every six months or 6,000 miles. In reality, it's a question of the quality of the oil (how much crud has accumulated?), not how long or how far you've used it. Oil is cheap and engines are expensive, so this method is successful.
  • When structural engineers design a building, they use a safety factor---building every piece n times as strong as need be---to account for variation in manufacturing, material aging, and unforeseeable events. Steel is cheap and wrongful-death lawsuits are expensive, so this method is successful.
  • Software engineering preaches: it's more important that the software works than that it works efficiently; instead of writing in machine code, we use high level languages that favor programmer time over execution time. Programmer time is expensive, but faster computers are cheap, so this method is successful.
So, next time you design something, take another look at the approximations you are making. More importantly, look for approximations that you are not yet making. So long as you are conservative enough to ensure your approximations are right enough, they can save you a lot of effort.

Maybe your robot can't read GPS, but there are other ways it can geolocate. Maybe your software cannot successfully identify some event, but if it can be right more often than not, it can pass the final say onto a human operator. Possibilities are endless.

01 February 2009

Proposal for a future Interstate Rail Network

From The Transport Politic:

The transport politic blog has run the numbers and made a proposal for a high-speed train network for the United States. I think they've done an incredible job, and so I want to mention them here. Their plan is based upon a few pillars:
  1. This network would be funded 50--90% by the Fed; compare to roads and highways that are funded 80--90% by the Fed.
  2. This funding would be part of the expected economic stimulus plans. In this respect, it sounds to me like a big WPA project instead of a bailout.
  3. A new national authority (which they call NatTrack) will be responsible for acquiring land, building and maintaining tracks, and renting those tracks to service providers such as Amtrak.
  4. They identify the ideal route as one that is (a) shorter than 500 miles, and (b) between two population centers of 100,000 or more. These numbers were chosen as they [a] typically beat an airplane in overall travel time, and [b] have enough potential customers to maintain the route.
  5. The main network features high-speed (150 mph) trains; this is supplemented by low-speed (70--120 mph).
  6. Their route choices were computed mathematically, under some well-defined assumptions. At the end of the article, they give numeric scores for each route to justify their choices.
Again, this is excellent work. I encourage you to read their article. I hope this has a chance.

Update to pipemiter.rb

Last week I wrote about a simple script to miter pipes. As soon as I wrote it, I realized about a hundred ways it could be improved, and about a hundred new ways it could be used.

My first observation was that the old version would fail if the cutting template was too large to fit on one page. To solve this, I added pagination support---the image will be split across multiple pages, and alignment marks will be drawn at each corner so they can be stitched back together.

Next, I noticed that real pipes have some thickness, i.e. the inner diameter might be significantly smaller than the outer diameter. As a result, you can now specify an inner diameter, and it will draw a gray curve alongside the black curve of the outer diameter. This can make it easier to miter thick tubes.

Also, I realized that one often wants to miter several tubes against one. In this case, you don't just measure how to cut the smaller tubes, but also where the smaller tubes meet the larger one. For this reason, the software will now draw a guide for the large tube as well.

Beyond that, there are numerous little changes to make it easier to use or more fool-proof.

The script is now in three files:
  1. pipemiter2.rb - the main executable,
  2. units.rb - handles unit conversion in a not-too-painful way, and
  3. postscript.rb - a library for drawing postscript images.
Save all of these files to some directory. Like before, we can either invoke this program from the command line, or in interactive mode.

You specify a pipe scenario as a single center pipe (i.e. the one that doesn't get mitered), and one or more radial pipes (i.e. the ones that get cut to fit the center pipe). The center pipe is simply specified as its diameter (or equivalently, its circumference or radius). Each radial pipe is specified as 3-D polar coordinates relative to the center pipe. The characteristics of the radial pipes are:
  1. Its (outer) diameter (or equivalently, it's circumference or radius);
  2. Optionally, its (inner) diameter;
  3. Its latitude, i.e. the angle at which it meets the center pipe, where a latitude of 0 means along the equator, and a latitude of +/- 90 means at the north or south pole;
  4. Its longitude, i.e. the angle around the center pipe, which defaults to 0; and
  5. Its projection, i.e. placement along the length of the center pipe.
That can be complicated, so lets go through a few examples. I'm going to write these examples in command line format, but if you don't specify any command line arguments it will go into interactive mode.

First, the example from my previous post: two toilet paper rolls (1.75" diameter) meet at a 30-degree angle. All I need to specify is the center pipe, and the diameter and latitude for the radial pipe:

$ ./pipemiter2.rb 1.75 1.75 30 >eg1.ps

Notice the output is more elaborate than last time. In particular, it's now a two page document. The first page shows the cutting profile, and then second page shows the profile on the pipe that you don't cut.

Let's try something more complicated. Suppose you wanted to build a sort of tripod-like thing. You have a center pipe 9-inch in circumference, and you have three 2-inch-diameter radial pipes. Each radial pipe meets the center pipe at a 60 degree angle, and each is evenly spaced along the circumference of the center pipe. Then you could type something like this:

$ ./pipemiter2.rb 9in-circ and 2in 60 and 2 60 by 120 and 2 60 by 240 >eg2.ps

There are a few things to note about this line. First, it specifies the center pipe (9in-circ). Next, it specified the three radial pipes: 2in 60 by x. The word and goes between each pipe specification, but is completely optional. Each pipe is specified as its diameter (2in), it's latitude (60), and its longitude (by 120, by 240, etc). The output file has four pages---a cutting template for each of the radial pipes, and then an alignment template for the center pipe.

So what's up with the units? Units are optional, though I often write them for clarity's sake. Diameter units default to inches, but could also be mm-circ (millimeters of circumference), cm-rad (centimeters of radius), ft (feet), etc. Length units default to inches, but could also be mm, cm, m, ft, etc. Angle units default to degrees, but could also be radians. Once you use a {length, diameter, angle} unit once, all future {length, diameter, angle} units default to that unit, just to save some typing. When in doubt, type the units.

So far, we've explained units, diameters, longitude, and latitude. What about projection and inner diameter? Notice that, in the previous example, the center points of each radial pipe all lined up along the equator of the center pipe. The projection parameter moves the center points up or down the length of the center pipe relative to the equator.

For example, suppose you have a center pipe of 9-inch-circumference, and want to attach five 1-inch-diameter radial pipes to it. Each of the radial pipes has an inner diameter of 0.5-inches. Next, suppose that the pipes should be connected at 60, 30, 0, -30, and -60 degrees, and that there should have a projection of +0.4-inch, +0.2-inch, 0-inch, -0.2-inch and -0.4-inch, respectively. Enter the command,

$ ./pipemiter2.rb 9in-circ \
and 1in id 0.5 60 by 0 up .4 \
and 1 id .5 30 by 72 up .2 \
and 1 id .5 0 by 144 \
and 1 id .5 -30 by 216 down .2 \
and 1 id .5 -60 by 288 down .4 \

The backslash character tells your shell that the command will span multiple lines. For each of the radial pipes, I use the id .5 command to specify the inner diameter of the pipe.

Example of pipe mitering, template
When you look at the output, you see that each cutting profile additionally has a gray curve. This gray curve is the shape of the miter along the inner diameter of the pipe---it gives you a few more hints of how to grind the pipes to meet perfectly. Also note how, on the sixth page, the inner diameter is drawn on the alignment guide. The projection is specified using up .2 or down .4, and causes the center of each pipe to move up or down the length of the center pipe on the alignment guide.

Example of pipe miteringIn case your wondering, that last example looks like this and this.

And that's it. Now you can specify the many-pipes-meet-one scenario, and have cutting and alignment guides automatically generated. This can make mitering pipes much easier for you.

Also of note is that this is not limited to mitering pipes. I've realized that it can be used to make paper/cardboard models, and that those models can be used as molds for plaster or concrete. I'll talk about pipe sculptures in a future post.

Related: Commenter Flemming gives a link to a similar online tool.

27 January 2009

Too little, too late: plug-ins for gcc

The GCC Steering Committee announced that they have created a new license exception that will---among other things---allow for the creation of a plug-in framework in GCC. In effect it will create a means by which compiler writers, researchers and tinkerers can write additions to the compiler without having to understand the vast majority of GCC.

I argue this is too little, too late.

GCC is open source software. This means that they give everyone the blueprints, so anyone can (potentially) go out and modify it to suite their needs. Such modifications might be fixing a bug, adding a new feature, or making it work slightly better in their situation. In that respect, GCC has done a wonderful job on providing a platform on which innovation can occur. But...

Software can be complicated. People have dedicated their careers to making software easier to write, understand, modify and maintain, or even prove that software does what it should. Despite all the effort, there's still no silver bullet. As software gets more mature---with more features and more compatibilities to maintain---it gets worse. And as software goes, compilers are typically some of the most complicated to write (not to toot my own horn as a compiler writer).

GCC suffers from the trifecta of software nightmares. GCC is as old as I am, and uses ancient techniques such as an RTL intermediate representation. GCC is feature-rich; its standard release supports at least seven source languages and at least twenty target architectures. GCC is a compiler. Take a look inside and you'll find a mind boggling nine hundred eighty four thousand lines of code (according to conservative estimates by David Wheeler in 2001). I'm not saying that GCC is bad software, but that it would be very difficult for even the most talented programmer to learn its internals in a short time.

Software engineers have seen this growth pattern before, and have come up with schemes to try to mitigate this risk. One such scheme is a plug-in framework. In a plug-in framework, the original developers present small, clean, easy to understand interface, and then anyone can write a plug-in that talks to that interface. This plug-in can do everything it needs to do, but it is significantly easier to create a plug-in than it is to modify the original code. Because of the constraints on the interface, the programmer can be blissfully ignorant of the operation of majority of the software.

Or, as an analogy: suppose you are a doctor, and your patient has a broken ankle. It's a really bad break, and you need to operate. So, you decide to remove all of the patient's skin so you can get to the break. Oh wait, that would be ridiculous... Instead, you only open up the ankle... much less risk of disturbing something else.

With compilers, there are some easy ways to support plug-ins. The easiest is to write the intermediate representation to disk in some form, let the plug-in do whatever to it, and then read it back and verify that it is well-formed. Implementing this sort of plug-in framework is very little additional work; in any sane software effort, this might be one of the earliest things designed, if for no other reason than convenience during the debugging phase.

But, for some reason GCC has not supported plug-ins until now. Twenty five years of software enthusiasts suggesting it, and the project hasn't done it. Why would GNU ignore feature requests, and even software engineering best practices?

From the GCC Exception FAQ:
...For a while now, the GCC developers have considered adding a plugin framework to the compiler. This would make it easier for others to contribute to the project, and accelerate the development of new compilation techniques for GCC. However, there have also been concerns that unscrupulous developers could write plugins that called out to proprietary software to transform the compiled code—effectively creating proprietary extensions to GCC and defeating the purpose of the GPL. The updated exception prevents such abuse, enabling the GCC team to look forward to plugin developments. [emphasis is mine].
Yes, that is correct. They were afraid people would write software plug-ins---completely separate from GCC---and that those plug-ins would not be free software. They think that any software that interacts with GCC should be governed by GCC's license. In effect, the GNU and the GCC project tried to strong-arm other developers into writing free software. I'm all for free software, but I think this was (1) a prick move, and (2) the wrong decision from a project management perspective.

Why is this a prick move?

In short, I don't like having my hand forced. Let's ask again: what is open source? According to the Open Source Initiative's (OSI) definition (which is about as close as we'll get to a standard, accepted definition):

6. No Discrimination Against Fields of Endeavor

The license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research.


8. License Must Not Be Specific to a Product

The rights attached to the program must not depend on the program's being part of a particular software distribution. If the program is extracted from that distribution and used or distributed within the terms of the program's license, all parties to whom the program is redistributed should have the same rights as those that are granted in conjunction with the original software distribution.

9. License Must Not Restrict Other Software

The license must not
place restrictions on other software that is distributed along with the licensed software. For example, the license must not insist that all other programs distributed on the same medium must be open-source software.


Although this definition has been carefully crafted to dodge the question, it seems to me that GNU could not place the no-proprietary-plug-in restriction in their license, and so instead they mutilated their code to discourage such unscrupulous behavior.

But, ultimately, my code is my code, and my code is separate from GCC. The GCC project cannot accept that, and so they try to prevent me from writing it.

Why was this a bad project management decision?

For all of the reasons that make plug-ins a good idea, and then some.

I presented the notion of plug-ins as a best practice---and they are---but realize that all best-practices are just rules-of-thumb. We could never get universal agreement that GCC is bad code because it doesn't support plug-ins. In my opinion, the lack of this feature is a flaw in GCC, but there are bigger problems.

At the end of the day, I don't care if open source software is good code. Authors can write code however they want to. But it doesn't mean I'm going to take part in their software project if they are working against me. If you write code to prevent me from participating, then it cannot be open source.

As academics researching compilers (that's fun to say), we have simply moved away from GCC as an experimentation platform. There are compilers with less restrictive licenses and which encourage third party development. This, ultimately, is GCC's loss and not ours.

Take for instance llvm, introduced by Chris Lattner's thesis in 2002. The llvm project not only has better documentation (including howtos on writing plugins), but also has a bit of a social contract. From llvm intermediate representation documentation:

The LLVM code representation is designed to be used in three different forms: as an in-memory compiler IR, as an on-disk bitcode representation (suitable for fast loading by a Just-In-Time compiler), and as a human readable assembly language representation. This allows LLVM to provide a powerful intermediate representation for efficient compiler transformations and analysis, while providing a natural means to debug and visualize the transformations. The three different forms of LLVM are all equivalent. [emphasis is mine].
Or, in other words "yeah, we're not going to pull that crap." Because of all of the llvm project's support, llvm is quickly becoming a popular platform for compiler research, and will assuredly overtake it (if it hasn't already). I'm not saying that we don't use GCC; rather that the next hundred cool optimizations are being developed for other compilers, and will only later be back-ported to GCC.

In summary, an important part of the management of an open-source software project should be to enable new programmers to enter the project with little initial investment. GCC has not only missed this opportunity, but actively avoided it. As a result, GCC has alienated some of the community, and is losing ground to newer open source projects.

22 January 2009

Easily miter tubes

[Update: see version 2]

Have you ever want to join two tubes at some specific Linkangle? For example, imagine you were building a bicycle frame. To ensure a good clean weld, you want to miter one of the tubes so that it perfectly fits against the surface of the other. This, unfortunately, can be a pain in the butt.

So, I created a quick and dirty ruby script to do the math for me. I enter the diameter of the two tubes, and the angle at which they should join, and it draws the curve as a postscript document. I can then print that template, tape it to the smaller tube, and cut along its length.

Say for instance I wanted to join two toilet paper rolls (1.75" diameter) at 30 degrees. I simply type:

$ ./pipemiter.rb 1.75 1.75 30 in >tp.ps

or, if you prefer something more guided and interactive,

$ ./pipemiter.rb >tp.ps
Preferred unit (one of mm, m, ft, in, cm): in
Outer Diameter of larger pipe: 1.75
Outer Diameter of smaller pipe: 1.75
Angle of joint (degrees): 30

and it gives me a postscript document, which looks like this curve on the right.

Perfect 30degree miterAs a proof of principle, I tried it out with two toilet paper rolls
(to your left). They join perfectly!

The source code for the script is available for free.

20 January 2009

Happy OBAMA day!

Or, more importantly, happy NO MORE BUSH DAY!

Know that your people will judge you on what you can build, not on what you can destroy.