First off, I’m quite dissatisfied with my work.
But then again, isn’t every architect? No matter how fantastically we break down and lay out complex enterprise systems, there’s always something to be dissatisfied with even the best logical designs, physical hardware, business logic, and user experiences. We know well enough that enterprise software development is never complete. Sure, user stories and discrete tasks can be marked “complete” in an issue tracking system, but large enterprise systems are virtual organisms that can be endlessly extended, refined, and improved upon. There is no finish line, but rather a multidimensional cube of gradients where each metric of success is defined and measured by different stakeholders. So, when I state I’m dissatisfied with my work, that’s not a state of being, it’s an acknolwedgement that architecting and developing these systems is a continuum of satisficing stakeholders, not a process that is ever truley complete. We should be dissatisfied, because if we are not, we are complacent.
Measurements of Success
However, just because the composition of large and complex systems has no discrete end, it doesn’t mean success cannot be measured. There are a ton of metrics that can be derived to have some meaning to various parties in an ISV and the client ecosystem, some of which have meaning, and some of which can be predictors of success. When I look at a system, I intrinsically think about the technical metrics first – the layers of indirection, query costs, how chatty an interface is, cyclomatic complexity, interface definitions, the segregation of responsibility, patterns that are reusable and durable from one set of developers to the next, et cetera. But architects must understand that while these metrics do play a role in the ultimate success, re-usability, and appeal of a solution, they are not the same metrics a business user — usually those who define success at a more meaningful level for going concern of sustainable business — will consider. Instead, these technical metrics contribute to other metrics that are the ultimate way in which a product’s success will be measured and judged. Specifically, there are only three things that executive offices, sales, and prospects care about:
- What does the system do? (What are the features and benefits?)
- What does the system look like when it does it? (What’s the visual user experience?)
- How fast does the system do it?
Not that absent from that list is a metric worded like “How does the system do it?” Inevitably the ‘how’ question is part of large Requests For Proposal (RFP’s), but in my experiences, at the end of the day, those questions are mere pass-fail criteria that rarely play into an actual purchase decision or a contract renewal decision. Quite often both junior and senior developers, and many times even management fails to keep this in perspective. If a solution can demonstrate what it does — and what it does is what a customer needs it to do, that it does it in a pleasing way, and that it does it fast, users are satisfied.
That last item, “How fast does the system do it?”, seems out of place, doesn’t it? Now any whiney sales guy (I used to work with a lot of them, thankfully we have an awesome team where I’m at now) can tell you how a sluggish-feeling web page can tank a demo or blame a two second render time for a bacon he didn’t bring home last quarter, and cloistered developers are used to brushing off those comments. They really shouldn’t. Speed directly determines the success of a product in three ways:
Users who have a slow experience are less likely to start to use the product
KISSmetrics put together a fantastic infographic on this subject that shows how page abandonment is affected by web page load times.
And let’s not fool ourselves — just because your product is served on an intranet, not for the fickle consumption of the B2C public Internet, your users are no yes fickle or demanding. Nor are you immune to this phenomenon because you utilize native clients or rich internet applications (RIA’s) to provide your product or service. Users will abandon your way to access their data if it’s too slow, even if you might think they are a captive audience. For instance, in a world where data liberation is a real and powerful force — where users demand to export their data from your system to use the interface of their choice, or even worse, where users demand you provide API’s to your data so they can use your competitor’s user interface — no audience is captive. Even worse for those of you providing a B2C public Internet service, page load times play into search engine optimization (SEO) ranking algorithms, meaning a slow slight is less likely to even enter the consciousness of prospects who depend on a search engine to scope their perception of available services.
Users who have a slow experience are less likely to continue using a product
Let’s say you’ve enticed users with all your wonderful functionality and a slick Web 2.0 (I hate that term, for the record) user interface to visit your site, perhaps even sign-up and take it for a spin. Most developers fail to realize that a clunky web browsing experience in an application doesn’t just temporarily frustrate users, it affects their psychological perceptions about the credibility of your product (Fogg et al. 2001) as well as the quality of the service (Bouch, Kuchinsky, and Bhatti 2000). In one case which analyzed a large data site of an e-commerce site, a one second delay in page loads reduced customer conversion rates by 7%.
The above graphic is a visualization of a behavior model by BJ Fogg of Stanford University about how users motivation and ability create a threshold to take action, and what triggers a product can use to entice users to cross that threshold depending on their position along this action boundary. Truly fascinating stuff, but to distill it down into the context of this blog post — the marketing of your product and the value proposition of your service should be creating a high motivation for your end users. What a shame then, if users never take action to use your product because you failed to reduce barriers to usage, reducing the ability and increasing complexity because your site was sluggish. Crossing that boundary is one hurdle to cross, but ISV’s have the ability to move the boundary in the way the market, design, and implement the product.
The Cost-Speed Curvature
Okay, okay, you got it, right? The product needs to be fast. But how fast is fast enough? You can find studies from the late 1990′s that say 8-10 seconds is the gold standard. But back in reality, our expectations are closer to the 2-3 second threshold. The wiggle room is admittedly extraordinarily small in this minuscule window: it doesn’t accept any excuses due to the slow rendering speeds of ancient computers or low-powered mobile devices that might be using your site, the client’s low bandwidth, or buffer bloat in each piece of equipment between your server’s network card and your end user’s. Not to mention, most sites aren’t simply delivering static, cache-able content. They’re hitting web farms of web servers behind load balancers, often using a separate caching instance, subject to the disk and network I/O of a database server and any number of components in between to execute potentially long-running processes — all of which need to happen in a manner that still provides the perception of a speedy user experience.
Now, exactly how to get your product or service faster isn’t my concern, and it’s highly dependent on exactly what you do and exactly how you do it — your technology stack and specific infrastructure decisions. What I can tell you though is you need an answer to your executive suite, board, or ultimate impatient user who, no matter how performant (or not) your system is, asks, “How can we make this faster?” This answer shouldn’t be quantitative, as in, “We can shave 4 seconds off if we do Enhancement X, which will take two weeks”, unless you want to hear your words parroted back to you when you can’t deliver such an unrealistic expectation. Even if you have an amazing amount of profiled data points about each component of your system, quantifying improvements is a mental exercise with little predictable result in enterprise solutions.
Well, in any serious enterprise software solution, there is obviously code you didn’t write and pieces you didn’t architect. Even if you were Employee #1, and not inheriting a mess by a predecessor team or architect, inevitably you’re using multiple black boxes in your interconnected system in the form of code libraries. Even if you’re a big FOSS proponent and can technically look at any of the source code for those libraries, face it, in a real business you never will have the time to do so, if the nerdy interest. While you can sample the inputs and outputs into each of those closed systems, you can predict but you cannot quantify how changing an input will affect the performance of a closed system creating an output. Don’t try it, you will fail.
Instead, remember my opening paragraph — performance optimization, much like “feature completeness”, is not a goal, it is a process that is continual over the life of the product. Obviously, developers start this process Googling StackOverflow et al. for “slow IOC startup” or “IIS memory issues in WCF services” or whatever the issue is with your particular technology stack, and will review the “me too” comments to see if they too did a “me too” misconfiguration or misdesign. Maybe it’s “whoops, forgot to turn on web server GZIP compression” or “whoops, forgot to turn off debug symbols when I compile”. Typically, these are low-hanging fruit — low risk to affect change with a high potential impact. But eventually you run out of simple “whoops!” Eureka moments or answers to simple questions, and you end up having to ask harder questions that have fewer obvious answers, thus requiring time spent specifically on researching those answers and developing solutions in-house. When you think about it, there’s a real escalating cost for each unit of performance gain over the lifetime of the product for this very reason. Graphed as a curve, I’ll call it the Marginal Cost of Speed:
And this is, in fact, a reality that must be thoroughly understood inside a development team all the way up through the executive suite. Not dissimilar to how Einstein postulated the only way to achieve infinite speed was to harness infinite energy, the only way to get an instance page-load or a zero-latency back-end process completion is by spending an infinite amount of resources achieving that goal. I say this has to be understood at the development team level mostly because you will never, no matter how pragmatic and persuasive you are, convince the executive suite or the customer that you in fact cannot repeat the last thing you did that doubled performance, because the further you go down the performance optimization road, the narrower and longer it gets between mile markers. The development team needs to fully understand what constitutes low-hanging fruit and must have their efforts focused on those simple changes that affect the greatest change first, and not tackle such problems with an instinctive impulse to refactor.
Likewise, the executive and marketing teams need to understand the development of a lightning fast product is a last-mile problem, that reaching that nirvana will require an increasing amount of time (cost) and resources (cost) to achieve it. The effort is an exercise in satisficing the parameters to find an acceptable middle-ground. Usually, though, the realities of product development aren’t treated the same as the realities of other externally-governed factors, simply because they are perceived not to be governed by any absolutes since they are not external. Put another way, customers of Amazon.com might abandon the site because shipping times for purchases are too long, but the company can’t just start comp’ing overnight service for everyone. Well, they could do so, but the cost to acquire that customer just skyrocketed to a level that makes their business model unsustainable. Similarly, the time spent on performance optimization has a real and measurable cost, and it can actually be quantified as a cost to acquire and retain a customer when you think about how a performant site directly impacts customer acquisition and retention. Now, the business folks can definitely understand it in those terms. But, they’ll still want it faster anyway.
Where To Sit
So, where do you then sit on that curve? The real answer is, it doesn’t really matter how much you do or don’t want to make performance optimizations, particularly if they’re approaching the infinite cost asymptote of that graph. The answer is, you will have to sit wherever your competitors sit. Most of us out there building the next great thing aren’t making markets, we’re creating displacement products. For those of us doing so, we’ve got to chase after wherever your most successful competitor sits on the marginal cost of per speed graph. Now, to be fair, those guys have probably been working for a few years on their ascent up that cost-performance climb, and they probably have deeper pockets / more slack time to do so than you do if you’re breaking into a market, but there is a trade-off the suits can make. The accumulated cost to 90% of the graph is less than the whole last 10%, so put another way, if you can be at least performant to make 90% of those prospects who are 100% happy with your competitor’s product, that may well be enough to get displace enough business to let you keep tackling that last mile another day.
Obviously, this question can’t be completely answered that way, because it’s highly dependent on your specific markets. Are you entering a market with a democratic offering of grass-roots, home-grown alternatives or are you tackling an oligarchy industry? Are you targeting disparate customers, or are your customers banded together in trade associations — which translates to — how much does your reputation change for each success or each failure? How are your customers allowed to back out of a contract if they find performance or other factors don’t match the vision sold to them? These answers may make the “how fast does it need to be” answer necessitate a disproportionately higher amount of resources and time to get it where it needs to be to have a good, marketable value proposition.
In summary, you never really should sit anywhere on that curve, you should be climbing it. It will cost you more the further you climb, but you should never feel like you’re done optimizing performance, and you should never stop continuously reviewing it. Remember how I mentioned most of us are in the displacement business? Even if you’re not, someday, someone else will be, looking to displace you. That guy might be me someday, and rest assured, I won’t rest assured anywhere.