You are on page 1of 4

Delete an inline function, save 794 kB

Posted on January 22, 2017by brucedawson

In the previous episode of Simple Changes to Shrink Chrome I discussed how deleting
const from a few key locations could lead to dramatic size savings, due to a VC++
compiler quirk. In this episode Ill show how deleting an inline function definition can
lead to similar savings.
The savings this time are less important as they are mostly in the .BSS segment, but
there are also some modest code-size savings, and some interesting lessons. Its worth
confessing up front that this time the problem being solved was not caused by a
compiler quirk it was entirely self inflicted.
Doing this investigation has reminded me that the behavior of linkers is best described
by chaos theory the details of their behavior defy prediction by simple heuristics, and
the results can be changed dramatically by tiny butterflies code changes.
In August of 2016 Chrome switched meta-build systems, from gyp to gn. Both gyp and
gn are systems for creating ninja files and these are the files that define how Chrome is
built. Changing from gyp to gn was a huge project because of the thousands of
assumptions that were embedded in Chromes many .gyp files, but the change was
necessary gyp was showing its age.
Part of my contribution was to help investigate binary size regressions and one
technique I used was to export symbol lists and see which ones appeared in gn builds
but not gyp builds. Here are the six largest globals in chrome.dll from a gyp build near
the transition:
gyp:
122784 kBrotliDictionary
65536 jpeg_nbits_table
54904 propsVectorsTrie_index
47152 device::UsbIds::vendors_
44815 kNetworkingPrivate
39640 propsTrie_index
And here are the six largest globals from a similar gn build:
gn:
131072 ff_sin_65536
131072 ff_cos_65536
122784 kBrotliDictionary
65536 jpeg_nbits_table
65536 ff_sin_32768
65536 ff_cos_32768
The ff_sin* and ff_cos* globals (which go from 65536 entries down to 16) are from
ffmpeg, and since they werent in the gyp build they clearly arent supposed to be in
chrome.dll. A bit of investigation using linker_verbose_tracking.py led to this
change which switched some source_set targets (lists of object files to be linked) to
static_library targets, thus telling the linker to be less aggressive about pulling in their
code. This worked, I closed the bug, and got on with my life.

Are you sure?

Until I noticed that the arrays were still there.


I had made the amateur mistake of failing to verify my results on Chromes official build.
The fix should have worked on those builds as well, but Murphys Law doesnt care
about should.
So, I repeated my investigation using the same tools, but now using Chromes official
build. This was a painful process. Chromes official build uses Link Time Code
Generation (also known as Link Time Optimization) which means that every change, no
matter how small, regenerates all of the code for Chromes enormous DLLs. My iteration
times went from a couple of minutes to about an hour. Uggh.
But, its just computer time. I filled in the gaps with other projects. The bigger problem
was that my technique was not working. No matter how many source_set targets I
changed to static_library targets the linker kept finding multiple reasons to pull in
ffmpeg.
My Python script works by scanning the verbose linker output and printing the chain of
dependencies that lead to a particular object file (in this case rdft.obj, the one that
defines the arrays) being pulled in. I decided that I needed to modify my tool to print
out why each object file was being pulled in. So, I changed it to list which symbol caused
a particular object file to be pulled in and I started seeing results like this:
rdft.obj referenced by
avfft.obj for symbol ff_rdft_init
avfft.obj referenced by
FFTFrameFFMPEG.obj for symbol av_rdft_calc
FFTFrameFFMPEG.obj referenced by
PeriodicWave.obj for symbol blink::FFTFrame::doInverseFFT
PeriodicWave.obj referenced by
SkGeometry.obj for symbol log2f
Looking at a few sets of these results I noticed some strange things. The crucial object
file seemed to always be PeriodicWave.obj, and it was always being pulled in because
some other object file needed a copy of log2f. This was peculiar because
PeriodicWave.obj is a Blink (ne WebKit) file, which shouldnt be needed in chrome.dll.
Its also peculiar because log2f is a C Run-Time (CRT) function, which shouldnt be
supplied by Blink anyway.

PeriodicWave.cpp didnt contain a


definition for log2f which left me
confused for a while but I eventually
put the pieces together:
1.

MathExtra.h defined inline


log2 and log2f function
2.
PeriodicWave.cpp included
MathExtra.h and contained
some calls to log2f
3.
When VC++ notices a call to
an inline function it generates a
non-inline version of that
function, in case it doesnt get
inlined
4.
That non-inline version of log2f happened to get pulled in by the official builds
which then pulled in the rest of PeriodicWave.obj, and the /OPT:REF optimization
that is supposed to discard the unused pieces was unable to get rid of them
The definition of log2 and log2f in MathExtras.h were only ever intended to work
around their absence in VC++ and Android. Since VC++ now had those functions the
inline definitions were unnecessary, and apparently actively harmful. So I landed a pair
of changes that removed them, first just for Windows and then for Android:
Dont define log2 and log2f in blink (Windows only)
Dont define log2 and log2f in blink for Android
The one remaining mystery is why this only happened in the official builds. I dont know
the answer with 100% certainty, but I know enough about how symbol resolution works
in linkers to have some ideas. Linkers start out with the set of object files that you have
specified on the command line the contents of these object files will be included in the
final binary (unless the linker can prove that removing them makes no difference). Then
the linker builds up a list of unresolved symbols and starts scanning through the set of
libraries that were specified on the command line the object files in these libraries will
be included as-needed. Whenever a needed symbol is found in one of the library object
files then that object file is added to the list to be linked it becomes like one of the
command-line object files. The process continues until all symbols are resolved.
The symbol search order is unspecified by C++ and due to the One Definition
Rule (ODR) the search order should not matter. But, an obvious implementation is to
repeatedly scan through the libraries, looking for symbols, until there are no more
unresolved. This means that if you have two copies of the same symbol then it is
undefined which you will get it depends where you are when you first find that you
need it. Since pulling in a symbol pulls in the rest of its object file this means that linking
is fundamentally unpredictable. And, if you have ODR violations (as we did) then the
results are not even defined.
So, the official builds perturbed the search algorithm enough that a different copy of
log2f was found, and that led to unintended consequences.

Ideally the linker would notice the ODR violation, but doing so ends up being expensive
you have to look into every object file to see if any of functions are defined in different
ways, and the definition of different is not obvious. There are good reasons that the C+
+ standard does not require diagnostics for ODR violations.
Ideally chrome.dll wouldnt even list Blink as something to be linked with. Another
Chrome developer has since fixed that.
I still feel bad about not noticing that the arrays were in our official builds after I fixed
them the first time. Luckily Im the sort of developer who randomly downloads symbols
for Chromes canary builds and runs analysis tools on them, so I eventually redeemed
myself.
If youre the sort of developer who likes browsing Chromes source then you can see the
(tiny) change for Windows by typing > git show 036e2ce in a Chromium repo. A more
accessible version can be found here.
The net effect of this change on the size of chrome.dlls sections (in-memory size) was:
chrome.dll
.text:
-6656 bytes change
.rdata:
240 bytes change
.data:
-786624 bytes change
.rodata:
-592 bytes change
.reloc:
-872 bytes change
Total change: -794504 bytes
More details of the techniques used can be found here. If you want to poke at some
relevant chrome.dll PDBs then you can use retrievesymbols from UIforETW to grab
them and then run SymbolSort or ShowGlobals on them. Just set
_NT_SYMBOL_PATH to something like
SRV*c:\symbols*https://msdl.microsoft.com/download/symbols and then run one of
these:
> retrievesymbols B6A75B4A5FE4446CAB6AE159B22DD91F 1 chrome.dll.pdb gyp
build (September 24th, 2016)
> retrievesymbols 0D12FEA487E942BB8719672D45464C51 1 chrome.dll.pdb gn
build (October 30th, 2016)
> retrievesymbols C39CCDB899B34A7CBCC25F51E64B1F62 1 chrome.dll.pdb recent
gn build (January 13, 2017)
These werent the PDBs used for the original investigation, but they show the issue.

You might also like