Using the builder pattern with subclasses

[Back when I worked for Sun Microsystems, I had an occasional blog. It survived the Great Absorption into Oracle for a while, but it now seems to have gone the way of all things. Here's an entry from it, fished out of the Internet Archive.]

Josh Bloch's Effective Java popularized the Builder Pattern as a more palatable way of constructing objects than constructors or factory methods when there are potentially many constructor parameters. The formulation in Effective Java makes for a particularly readable construction, like this:

new Rectangle.Builder().height(250).width(300).color(PINK).build();

The advantage over a constructor invocation like new Rectangle(250, 300, PINK); here is that you don't have to guess whether 250 is the height or the width. More generally, if the constructor allowed you to specify many other parameters — such as position, opacity, transforms, effects, and so on — it would quickly get very messy. The builder pattern stays clean.

But one question that arises is: how does it work in the presence of inheritance? For example, suppose you have an abstract Shape class that represents an arbitrary graphical shape, with a set of properties that are common to all shapes, such as opacity and transforms. And suppose you have a number of concrete subclasses such as RectangleCirclePath and so on, each with its own properties, like Rectangle's height and width.

As a reminder, here's what the builder pattern looks like in the absence of subclassing:

public class Rectangle {
    private final double opacity;
    private final double height;
    ...

    public static class Builder {
        private double opacity;
        private double height;
        ...

        public Builder opacity(double opacity) {
            this.opacity = opacity;
            return this;
        }

        public Builder height(double height) {
            this.height = height;
            return this;
        }
        ...

        public Rectangle build() {
            return new Rectangle(this);
        }
    }

    private Rectangle(Builder builder) {
        this.opacity = builder.opacity;
        this.height = builder.height;
        ...
    }
}

Subclassing

Now what happens if we want to introduce a Shape superclass, and move some of the properties of Rectangle into it? Let's just concentrate on opacity and height. We'll move opacity into Shape (all shapes have opacity) and leave height in Rectangle (a circle for example is defined by its radius rather than its height).

Here's a first attempt. Obviously we are no longer going to be able to keep our constructors private, at least not in Shape, since a subclass needs to be able to invoke its superclass's constructor. So we'll make it protected, and we get this:

// First attempt at Shape/Rectangle separation.  This does not work!
public class Shape {
    private final double opacity;

    public static class Builder {
        private double opacity;

        public Builder opacity(double opacity) {
            this.opacity = opacity;
            return this;
        }

        public Shape build() {
            return new Shape(this);
        }
    }

    protected Shape(Builder builder) {
        this.opacity = builder.opacity;
    }
}

public class Rectangle extends Shape {
    private final double height;

    public static class Builder extends Shape.Builder {
        private double height;

        public Builder height(double height) {
            this.height = height;
            return this;
        }

        @Override
        public Rectangle build() {
            return new Rectangle(this);
        }
    }

    protected Rectangle(Builder builder) {
        super(builder);
        this.height = height;
    }
}

That looks pretty simple. Rectangle.Builder extends Shape.Builder, so it inherits the opacity method, and adds its own height. It overrides the build method to return its own Rectangle(Builder) constructor. That constructor passes its Builder to the superclass, so Shape can set the opacity of the new object, and Rectangle sets the height. So what's the problem?

Say we write:

Rectangle r = new Rectangle.Builder().opacity(0.5).height(250).build();

It doesn't compile. The reason why is slightly hidden by the reuse of the name Builder in both Shape.Builder and Rectangle.BuilderRectangle.Builder inherits its opacity method from Shape.Builder. That method is declared to return Shape.Builder. But Shape.Builder does not have a height method. Even though the actual Rectangle.Builder object that we are using does have a height method, the compiler is using the declared type of opacity(0.5), which as we just saw is Shape.Builder.

Suppose we required our callers to put the methods in order from subclass to superclass (which would be a very unpleasant requirement). So here the caller would have to write:

Rectangle r = new Rectangle.Builder().height(250).opacity(0.5).build();

But that still doesn't help. opacity(0.5) still returns Shape.Builder, so therefore the build() at the end is Shape.Builder.build(), which returns a Shape not a Rectangle.

A nasty solution

One way out is for Rectangle.Builder to redeclare opacity:

public class Rectangle extends Shape {
...
    public static class Builder extends Shape.Builder {
        ...
        @Override
        public Builder opacity(double opacity) {
            super.opacity(opacity);
            return this;
        }
        ...

That does fix our problem. But it's very nasty. It means that every time you add a property to Shape you have to visit all its subclasses, and all its subclasses' subclasses, and so on, to add a new color(c) or whatever method to all their Builder classes. It's hardly worth using inheritance at all if you end up doing things like that.

A better solution

There is a better solution, which allows us to do what we want without having to pollute subclasses with their superclass's properties. The main drawback is that it uses mindbending generics declarations similar to Enum<E extends Enum<E>>. But it works. Here's the code:

public class Shape {
    private final double opacity;

    protected static abstract class Init<T extends Init<T>> {
        private double opacity;

        protected abstract T self();

        public T opacity(double opacity) {
            this.opacity = opacity;
            return self();
        }

        public Shape build() {
            return new Shape(this);
        }
    }

    public static class Builder extends Init<Builder> {
        @Override
        protected Builder self() {
            return this;
        }
    }

    protected Shape(Init<?> init) {
        this.opacity = init.opacity;
    }
}

public class Rectangle extends Shape {
    private final double height;

    protected static abstract class Init<T extends Init<T>> extends Shape.Init<T> {
        private double height;

        public T height(double height) {
            this.height = height;
            return self();
        }

        public Rectangle build() {
            return new Rectangle(this);
        }
    }

    public static class Builder extends Init<Builder> {
        @Override
        protected Builder self() {
            return this;
        }
    }

    protected Rectangle(Init<?> init) {
        super(init);
        this.height = init.height;
    }
}

The idea is that instead of hardwiring opacity() to return the type of the Builder that defines it, we introduce a type parameter T and we return T. The self-referential definition Init<T extends Init<T>> is what allows us to make the type of the inherited opacity() in Rectangle.Builder be Rectangle.Builder rather than Shape.Builder.

We can no longer simply return this from opacity(), since at the point where it is defined, this is an Init, not a T. So instead of this, we return self(), and we arrange for self() to be overridden so that it returns the appropriate this. This is pure ceremony to keep the compiler happy: all of the Builder classes have identical definitions of self(). (This is what Angelika Langer calls the getThis() trick, citing Maurice Naftalin and Philip Wadler for the name and Heinz Kabutz for the first publication.)

Why do we need to split our previous Builder class into separate Init and Builder classes? Because we still want the caller to be able to write:

Rectangle r = new Rectangle.Builder().opacity(0.5).height(250).build();

If Builder were the class with the self-referential type (Builder<T extends Builder<T>>) then new Rectangle.Builder() would be missing a type argument. And even if we were prepared to bother every caller with having to supply such an argument, what would it be? We cannot write new Builder<Builder>() because then the second Builder is missing a type argument! That's why we need one class that has the self-referential <T> parameter (so opacity() can return T) and another one that doesn't (so callers can make instances of it).

Better still: static factory instead of constructor

It isn't very nice that users can see both the Builder and Init classes. Is there a way to hide one of them?

The answer is yes, if we say that the way to build a rectangle is this:

Rectangle r = Rectangle.builder().opacity(0.5).height(250).build();

That is, the caller gets a builder from the static method Rectangle.builder() rather than with new Rectangle.Builder(). Here's what the modified code looks like:

public class Shape {
    private final double opacity;

    public static abstract class Builder<T extends Builder<T>> {
        private double opacity;

        protected abstract T self();

        public T opacity(double opacity) {
            this.opacity = opacity;
            return self();
        }

        public Shape build() {
            return new Shape(this);
        }
    }

    private static class Builder2 extends Builder<Builder2> {
        @Override
        protected Builder2 self() {
            return this;
        }
    }

    public static Builder<?> builder() {
        return new Builder2();
    }

    protected Shape(Builder<?> builder) {
        this.opacity = builder.opacity;
    }
}

public class Rectangle extends Shape {
    private final double height;

    public static abstract class Builder<T extends Builder<T>> extends Shape.Builder<T> {
        private double height;

        public T height(double height) {
            this.height = height;
            return self();
        }

        public Rectangle build() {
            return new Rectangle(this);
        }
    }

    private static class Builder2 extends Builder<Builder2> {
        @Override
        protected Builder2 self() {
            return this;
        }
    }

    public static Builder<?> builder() {
        return new Builder2();
    }

    protected Rectangle(Builder<?> builder) {
        super(builder);
        this.opacity = builder.opacity;
    }
}

 

This is my preferred version.

A shorter, smellier variant

There is a way to avoid having to define a second builder class for every class you want to construct, but it is a bit nasty. I won't spell it out like the other cases, but here's the gist:

public class Shape {
    ...
    public static class Builder<T extends Builder<T>> {
        ...
        @SuppressWarnings("unchecked")  // Smell 1
        protected T self() {
            return (T) this;            // Unchecked cast!
        }
        ...
    }

    @SuppressWarnings("rawtypes")       // Smell 2
    public static Builder<?> builder() {
        return new Builder();           // Raw type - no type argument!
    }
    ...
}

public class Rectangle extends Shape {
    ...
    public static class Builder<T extends Builder<T>> extends Shape.Builder<T> {
        ... no need to define self() ...
    }

    @SuppressWarnings("rawtypes")
    public static Builder<?> builder() {
        return new Builder();
    }
    ...
}

Some notes

I showed the opacity and height fields as final because fields should be final whenever possible, and because it demonstrates that this pattern works correctly with final fields. But of course it works with non-final fields too.

If the Shape class were abstract, we would omit its builder's build() method, and the static builder() method in the variant that has one, but otherwise everything would be the same.

If you have several hierarchies of classes using this pattern, you might want to extract the self() method into a separate interface or abstract class, such as
public interface Self<T extends Self<T>>.

You can have required constructor parameters by putting those parameters in each Builder's constructor, and in the static factory method in that variant. Of course then you lose the ability to name those parameters.

You can have default values for parameters by providing initializers in the builder (not in the class itself!), for example:

public class Shape {
    ...
    public static abstract class Builder<T extends Builder<T>> {
        private double opacity = 1.0;
    ...

Can we do better?

In my preferred solution, every class in the hierarchy has to duplicate the code in blue. Is there a way to reduce this boilerplate code without resorting to smelly hacks? I have not found one, but I'm open to suggestions!

Comments

Popular posts from this blog

Solving Wordle

Six guesses suffice in Wordle's hard mode