Kurt Guntheroth's Old Hands Blog

Thursday, May 28, 2020

Re-Intermediation -- Controlling the Internet as a Channel of Distribution

I saw an article on Slate about how Google wants to build the famous talking computer from Star Trek. Google doesn't want to return links that might contain the answer to your question, but rather to provide a direct answer. It's a romantic vision. I bet it motivates their engineering team. But it can never be.

There is a big, unbridgeable difference between Google and the Star Trek computer. Google wants to sell you something. If Google gets to the point where it can reasonably answer questions like, “What computer is best for me?”, or “Who has good prices on HDTV sets?”, or "What restaurants are nearby?", I won't be able to trust the answers, because they are shamelessly influenced by advertizer dollars.

What concerns me is whether I will have any choice in the matter.

The web was once touted as a powerful force for consumers, disintermediating old industries like TV networks, record studios, newspapers, and retail stores. But it is far more apt to think of the web as merely a new channel of distribution, disrupting older channels because of the internet's lower cost structure, and inserting new and voracious intermediaries between producers and consumers. Rather than share cost reduction with consumers, these new intermediaries want to capture all the savings as profit for themselves.

But an intermediary can only capture these savings if they dominate this new channel. Unfortunately, the internet makes that easy by reducing a company's brand name to a few keystrokes. Get that brand embedded in peoples' heads, and you own the internet as a channel. While Xerox fought for years to keep its name from becoming a verb, Google can laugh all the way to the bank. It may technically lose the ability to prevent a competitor naming itself Google, but it owns the google.com domain name, so who cares?

If Google can successfully provide direct answers instead of links, they will become the search engine. This gives them a huge advantage over other search engines, and enormous power over vendors of any product you may want to search for. Google will own the only path to find products. Amazon is set up as a marketplace, and is in one sense a competitor to Google, but people use Google first, before they even form the thought of buying a product.

Google is already dismantling the fence between their ad-based search results and organic results. If Google can answer conversationally, they will no doubt completely eliminate any distinction.

If Google becomes "the" marketplace, then they will also wield enormous power over vendors. They don't have to provide a direct way to sort products by price. They can sell placement in their search results, and extract a taste of every sale

Wednesday, June 7, 2017

Circular Buffer Container Class

C++ comes with a set of templatized container classes so developers don't have to re-implement a doubly linked list or balanced binary tree class for every project. This is a tremendous time-saver you may not appreciate unless you've had to live without. It can also be a trap that limits performance.

The roster of container classes in the standard library is thin; there are three kinds of sequence container (vector/array/string, deque, list/forward_list), and two kinds of associative container (map/multimap/set/multiset, unordered_map/unordered_multimap/unordered_set/unordered_multiset). With such limited choice, a developer who needs a container with specific behavior may be forced to accept a lot of unwanted implementation baggage.

By giving up the ability to access the sequence as a C-style array, it is possible to obtain a container that is about as fast as std::vector, but supports constant-time insertion and removal from both ends. One variation of this container is the circular buffer. Work to standardize the circular buffer is underway, and in the meantime boost has a version. The circular buffer makes a great queue data structure.

I ran boost::circular_buffer through the performance tests for sequence containers from my book Optimized C++. Here is a link to a long article reporting the results.

Only one new kind of container (the unordered associative container) has been added to C++ since 1995 when the Standard Template Library was introduced. There is a lot of interest from communities with performance issues in adding additional container classes to future versions of C++.

Friday, September 23, 2016

C++ Optimization Bibliography

I wrote a wonderful book on performance tuning in C++. Plug, plug, plug. Buy it now!

The web is full of free optimization advice. The best of these references support their findings with timing benchmarks. Intermediate in quality are references that speak to the behavior of specific processors, but don't contain timing data to allow the reader to reason about the size of the performance speedup. Least interesting are simple lists of recommendations presented without evidence. These are a few frequently-cited references I found useful.

Most people think of code tweaks when they write about optimization. The performance improvement from statement-level optimization is unlikely to make much difference unless it is magnified by being frequently executed, in a loop or frequently called function, so these tweaks are of uncertain value.

Free Resources on the Web

Here is a free taste of the optimization drug, to whet your appetite and make you want to read my book, or to get you over those two days of the shakes while Amazon delivers your paper copy.

Short Notes

Tips for Optimizing C/C++ Code (PDF): This short note contains two dozen short thought-mantras about optimizing code. Explanations are necessarily brief. It can serve as an, "I didn't know that!" kind of checklist.
Optimizing C and C++ Code (HTML): Two dozen short thought-mantras on code optimization from game developers.
Optimization in C and C++: good practices, pitfalls, by Sébastien Binet (PDF): Optimization advice on PowerPoint slides. Covers a lot of ground, focuses on linked structures, great graphics.
Programming Optimization (HTML), by Paul Hsieh: An interesting mix of general advice, plus brief specific low-level advice for optimizing statements and expressions.
Performance Optimization (HTML): This is chapter 25 of the book Object Oriented System Design. It is notable for providing language-independent descriptions of design-time optimization techniques. It uses a dense and somewhat obsolete jargon defined elsewhere in the book, that can be a little hard to read.
Three Optimization Tips for C++, Andrei Alexandrescu (HTML): Anything Alexandrescu says is worth reading. This is more of a case study than a general advice piece, and contains many more than three tips. It features reduction in strength, reduction in array writing, and memoization.

This link has commentary and more measurements on the counting digits example Alexandrescu uses in his article.
Efficient C/C++ Coding Techniques, by Darren G. Moss (PDF): This is a performance analysis of how several compilers available in 2001 generate code for common flow-of-control constructs, loops that index tables, and methods of structure composition. The findings are unsurprising; that switch is often better than if-elseif-else, that pointers are efficient in loops, and that a class with virtual multiple inheritance is expensive (if that wasn't obvious).
Writing Efficient C and C Code Optimization, Koushik Ghosh (HTML): A grab-bag of statement- and expression-level optimization techniques, provided without supporting evidence for its effectiveness. Compilers may do some of these optimizations.
Techniques for Scientific C++, by Todd Veldhuizen (PDF): A somewhat older, but interesting article with a grab-bag of optimization advice. Covers optimizing compile time, eliminating virtual functions, how aliasing interferes with optimization, expression templates,
C++ Optimization Strategies and Techniques (HTML) by Pete Isensee: Isensee really knows his stuff. This web page represents the state of the art in optimization circa 1999 (as he notes himself). It is slightly dated now, but still very relevant. The advice is well supported with timing measurements and examples.
Faster C++ (HTML): Another grab-bag of statement-level optimizations, plus discussion of profiling, compiler flags, and cache issues. Good bibliography at the end.
FastC++: Coding Cpp Efficiently blog (HTML): I haven't explored this blog extensively, but he seems to know a lot about getting the C++ compiler to use SSE extensions.
The Top 10 Myths of Video Game Optimization, Eric Preisz (HTML): Interesting viewpoint on several well-loved optimizations. Of particular interest to me, that LUT-based algorithms may be a lose if they have reduced cache performance, since a cache miss is slower than the slowest instruction. The moral would be that small LUTs are good, but big ones can be problematical.
Mentor low Latency C or C++ code (HTML): This is an article on verifying that programs meet their latency goals, not on optimizing C++ per se, But it has a bunch of good rules about memory allocation, and a couple of other optimization topics, covered without supporting evidence.

Performance of a Circular Buffer vs. Vector, Deque, and List (HTML): This article reviews the performance measurements from chapter 10 of my book Optimized C++, plus a new type of container, the circular buffer, that is a candidate for inclusion in C++20.

Longer Works

Technical Report on C++ Performance ISO/IEC TR 18015:2004 (PDF)

This is a definitive 202 page report on performance-related issues in C++03 by the ISO C++ Committee, fully supported with extensive timing evidence. At the time this TR was written, C++ was under an unjustified attack for being inefficient compared to C. TR 18015 is amazing, offering a mature look at C++ which includes much of the optimization advice in Optimized C++, albeit in a compressed form. I was aware of TR 18015 before I began writing my book, then forgot about it until the book was in production.It's a gem.

TR 18015 covers all of the following:

Types of programs (embedded programs on small devices, servers running flat out) with performance issues
Costs of various C++ features: namespaces, type conversion and casting, classes and member function calls, exception handling, and templates, all supported with run time measurements.
Two methods of exception handling (code, table). This discussion also contains a list of alternative error handling mechanisms.
Ways to make the standard I/O functions faster by fiddling with locales.
A hardware interface using atomic and volatile.

TR 18015 spawned some articles in the embedded systems press. Some of these articles offer a much less mature “C++ is less efficient than C” viewpoint.

Efficient C/C++ Coding Techniques, Moss, 2000: "Explains" that C++ is less efficient than C if you use virtual multiple inheritance when simple containment will do. Some comparisons of control flow choices also motivated by poor understanding about what’s good about C++. Great diagrams on virtual functions and virtual inheritance, though.
Using C++ Efficiently In Embedded Applications, Quiroz, 1998: "Explains" that C++ exception handling is expensive because you don’t really need to destroy objects as you exit each stack frame, which is fine if resource leaks are OK with you. Paper talks about some C++ features and categorizes them into cheap, maybe-cheap, and expensive categories.
Reducing Run-Time Overhead in C++ Programs, Saks, 2002: A paper on optimizing C++. It has great 90/10 law quotes, and covers implicit copying and custom new and delete. It cites Bulka and Mayhew in its bibliography.
Representing and Manipulating Hardware in Standard C and C++, Saks: A paper on how hardware registers may be represented in C++.

Optimizing Software in C++: Agner Fog (PDF)

This is a long (150 page) treatment of statement and expression-level optimization in C++. The material is informed by deep knowledge of the x86 architecture, and makes reference to architectural details. It is only partially translatable to other platforms. Agner Fog actively maintains this document, so its content changes from time to time. If you find it useful, it may be worth caching a copy for your own use, lest the material change out from under you.

Optimizing C++: A book about improving C++ performance (Wikipedia | PDF)

There is some useful information in this small collection of Wikipedia entries, though not enough to call it a book. Good things about this wikibook include:

It evolved substantially during the time I was writing Optimized C++, growing to cover more material better.
It has a bibliography, including free on-line references, and some books that were new to me.

There are several problems with it, though:

No topic is covered exhaustively. There are, for instance, a few algorithmic techniques, but not a complete or even extensive catalog.
Most optimization advice is supported only by hand-waving arguments about how a compiler might behave.

Books

Efficient C++, Dov Bulka and David Mayhew (Amazon link) 1999, Addison Wesley, ISBN: 0-201-37950-3: Meticulously researched, but now slightly dated. I have tremendous respect for this work.
The Software Optimization Cookbook, Gerber, Bik, Smith, Tian (PDF table of contents): I have not read this book, but it appears to be a useful source of information on processor-dependent optimization for the IA-32 (Intel 32-bit) architecture.
Optimizing C++, Steve Heller (HTML), 1999, Prentice-Hall, ISBN: 0-13-977430-0: Warning: not about performance tuning in C++ at all, but rather about optimizing an ISAM database. You can view it for free on the web.
Optimized C++, Kurt Guntheroth (Amazon link), 2016, O'Reilly Media, ISBN: 978-1491922064: My book, the best of everything, of course.

Other Notes

Microsoft Visual C++ Tips and Tricks has all sorts of debugger tricks. Not about optimization, but interesting.

Friday, December 4, 2015

Mastery Takes Time

I have talked about achieving mastery previously. Here's another, perhaps better researched article on how long it takes to achieve mastery of a discipline. For those too indolent to click a link, what it says is that it takes about 10,000 hours, or about five full-time years, to obtain a reasonable degree of mastery, and it doesn't matter what it is; playing the violin, swinging a baseball bat, or coding in C++. A study of violinists indicated that "natural" talent was not a factor. The people who were the best were the ones who practiced the most and the longest.

This is something recent college grads all seem not to know about the careers on which they are embarking. They think (like I once thought) that their natural intelligence and their good education was what made the difference. As an Old Hand, I can tell you that practice has a lot to do with it, and good as you are, you will get a lot better when you've been doing the job for 20 years.

Don't Teach Yourself to Code by Writing a Library

It is a terrible idea to teach yourself how to use some unfamiliar feature of C++ by designing a library that heavily uses this feature.

I cannot think of an easily explained example, because libraries that come into existence in this way are very rarely easily explained. Even experienced C++ folks look at the thing and say, "Huh...WHAT?! Can you DO THAT?" Readers with more than a couple years of development experience are grimacing right now, because chances are good that the code they are paid to work in every day relies on a library that some genius thought would be cooler if it all used some tricks they'd been reading about.

It may be cool to the original author, but to anyone who hasn't read the same blogs as the author, it's just mysterious. Especially mysterious when the author has trouble doing what they originally imagined was possible, and ends up covering this lack of experience with a hack, that then lingers long after the author stops working on the library.

Writing libraries is a great time to slow down; to be conservative; to solicit the widest possible review; to ask your peers, "Is this good style?"

Thursday, June 25, 2015

The Mysterious Hidden Languages of C++

C++ is old enough to have teenaged children (like Java, conceived when C++ was itself a wild teenager). Some lucky developers have spent virtually their whole career writing C++. But while C++ grew and solidified, the world of programming languages bubbled and foamed around it. Languages and language ideas came and went. Lexical styles changed and changed again. In these modern days of Hascell, python and Javascript, it can be hard to remember that the folks who originally designed C and C++ learned to program in BCPL.

The Secret Expression Language Inside C++

An expression language is a language where every executable statement produces a value. The vestigal expression language embedded in C++ includes individual expressions, sequences, blocks, if/else, and function call, but not iteration, alas.


x = 1, y = 2, z = x + y

is a sequence. It sets the values of variables x, y, and z, and produces the value 3.

(rand(), 0)

is a block. It calls the function rand(), and produces the value 0.


x == 1 ? y : (z, 0)

is an if/else statement. It evaluates the boolean condition x == 1. If x and y have the values set in the previous expression, it produces the value 2. If x is not equal to 1, it produces zero.

C++ Has a Shiny New Functional Language Too

Here's the old syntax for a function.


int foo(int x)
{
    return x*x-1;
}

Here's the same function as a lambda. Return value in a different place. Syntax for associating a name with the function different. New rules to remember about the difference between lambdas and function pointers.


auto foo = [](int x)
{
    return x*x-1;
};

Lambdas have a whole new scheme (ouch, pun) of type deduction rules. Someday they are going to be special. But today they are mostly syntactic sugar, for people who grew up with Javascript or some exotic functional language so they can pretend that C++ is trendy too.

Monday, March 30, 2015

Aphorisms for Work

Here are some aphorisms that I think upon, usually during meetings with grumpy managers.

It takes the same time to do a project right as to do it wrong

You just spend the tine in different places. You can rush through coding without any planning, but you will spend a lot of time in front of the debugger, and maybe have to throw your code away and start over when you understand the problem better. Or you can do thoughtful design, code at a deliberate pace, run your unit tests, and be done. It comes down to a lifestyle choice.

The most dangerous thing in the world is a half-understood Harvard Business Review article

There's an HBR article that says, essentially, "All other things being equal, getting software done sooner results in greater revenue." This one paper is the cause of so much misery for development teams. Managers love to use this article as an excuse to pry the software out of development and jam it into production before it's finished. They never read the "all other things being equal" part of the article.

There is an HBR article that says, "Pay is not what motivates employees the most." There are managers who read this article and think they can overwork and underpay their employees. They don't read the part about what does motivate employees, like empowerment, recognition, and work/life balance.

The most pain is inflicted when the manager hears a summary second-hand, and doesn't even know he has only half the advice. He thinks, "There's an HRB article that says you should ship the project as early as possible", or "HBR says employees don't care that much about pay."

When everything is important, nothing is

Ever have your manager tell you on Monday to work on this, because it is the most important thing, and then tell you on Tuesday to work on that, because it is the most important thing. And then you ask him to choose which is most important, and he says they both are? That is what a weak manager does when he doesn't want to make a decision. There must be a most important thing, the thing to finish first.

If everything is important, if everything is an emergency, then nothing is really important, because your manager will blame you for what you didn't choose, no matter what you did choose. If your manager behaves in this way, it's a good time to get your resume updated and start looking for work.

The 90/90 Rule: The first 90% of development takes 90% of the time, and the last 10% takes the other 90% of the time

Oh, how many times has a developer said he was 90% done in one month, only to spend a second month on that last pesky ten percent? You're 90 percent done with the part of the task you know about. Generally that's about half the task. It's a very mature organization indeed that can correctly estimate completion percentages. Your best chance is to estimate things using a consistent process, and then consistently record how your estimate matched reality. Then you can de-rate future estimates until your estimation process matures.