Thoughts concerning API-Documentation: Juli 2011

The identifier of an operation plays an important role for the understanding of the code using the operation as well as for understanding the operation itself. Operations usually have the meaning of a bundle of instructions or commands which should be executed by the computer. I believe that this is the reason why programmers name them with an imperative sentence (e.g. "get the customer by id", "generate invoice" or "compute price") or a substantified verbal phrase ("customer id filter", "invoice generation" or "price computation"). This style in general seems to me to be widely established and recommended (see Robert C. Martin's "Clean Code", page 25 and following).

Sometimes I observe operation-identifiers containing two verbs in the grammatical role of a predicate and linked by a conjunction, like "generate and send invoice". I remember that I intuitively also tried to name operations this way for three or four times. Consider this piece of code as an example (in Java-syntax):

public static void generateAndSendInvoice(Order order) throws IOException{

Document invoice = generateInvoice(order);

sendInvoice(invoice, order.getCustomer());

}

Since operations often contain many commands it seems strange to me to mention two of them explicitly in these few cases for different reasons:

I do not really understand the set of criteria to select the actions mentioned in the identifier from the set of instructions in the operation. If these criteria could not be described, they could not be added to naming conventions in a project. This could lead to different naming styles in one project and decrease the readability of the code.
I think that identifiers could become very confusing if they reflect the internal control structure of an operation. Consider the above example with a condition. The invoice should only be send, if its recipient is verified: "generate and send invoice if recipient is ok". Again, I think programs containing these kind of identifiers become hard to read.
The identifier represents what the operation is responsible for. From my point of view, the responsibility of generating the invoice remains in the operation "generateInvoice", but the identifier "generateAndSendInvoice" implies that this operation is also responsible for the generation, but this is not the case.

I asked myself why I wanted to label operations with two predicates. Actually I didn't find a sophisticating answer to that question. I have the impression that programmers usually choose identifiers to express the intention or the final result of an operation. This is in line with observations made by Robert C. Martin (see page 18 of his book "Clean Code"). For example, the operation "computePrice" may compute every summand included in the final price, but its identifier does not express this. This rule would be violated by identifiers with two predicates since they describe not only the intended action, but also its intermediate steps (e.g. the computation of summands or the generation of an invoice). One reason for this kind of extended identifiers might be that the identifier "send invoice" does not tell the whole story. Imagine that this operation does not only submit a invoice as mail via SMTP, but also creates a printed invoice from a PDF document. Since the verb "to send" has a strong meaning in the context of software engineering, the programmer might think that this identifier confuses the other team members. As solution to this dilemma it has been extended by mentioning the generation of the invoice document: "generate and send invoice".

How could this dilemma be compensated? From my point of view there are other strategies to solve this problem than extending the identifier with more than one predicate (this list may be incomplete):

The programmer could generalize the verb in a way covering both actions: the invoice generation and submission. For the above example such a verb could be "to charge", e.g. "charge Order".
A second option could be to specialize the object e.g. with an adjective. The submission of the invoice to the customer is the final intention of the operation, so it could be named "send ready-to-print invoice". Since the invoice is not part of the operation's signature, the identifier implies that the operation receives the invoice in another way.

I conclude, that it is hard work for programmers to find short and concise names for operations. I described two strategies to refactor names with two verbs. All in all I think that there is a huge need for a set of naming conventions going beyond technical terms such as delete, find, create and so on. Such a verb-lexicon should define per verb what an operation labeled with the verb is allowed to do and vice versa. The 19 thematic grids included in the current iDocIt! release could be a starting point for projects to define their lexicons.

This post was inspired massively by discussions with my colleagues Timm Schwemann, Julian Sirapanji (both Hamburger Berater Team GmbH) and Stefan Jockenhövel (Paragon Systemhaus GmbH). Thanks a lot ;).

Thoughts concerning API-Documentation

Sonntag, 31. Juli 2011

The "multi verb in identifiers "-dilemma and strategies how to solve it