LLVM Machine Passes
LLVM has a very rich set of analysis and transformation passes dedicated to operate on the Intermediate Representation (IR) format which is used as a common abstraction between the frontend operating on the very high level of source code, and the backend responsible for lowering the IR into a respective target ISA. The IR is provided in a static single assignment form (SSA)1. The primary feature of SSA is the requirement of a single assignment to every variable. Although LLVM as a project suffers from many undocumented features and tools, the documentation on IR passes is suprisingly good, including a propably complete list of available passes2 and a quite good introduction3 on implementing passes, although the content on pass manager and specifying requirements could be updated to reflect the difference between legacy and modern pass manager.
But what happens when we try to analyze the machine representation of the code on the target side? According to code generator documentation4 and the avery good article ‘Life of an instruction in LLVM’5, IR building blocks have their counterparts in the backend. An IR function is represented by a MachineFunction
on the backend side. MachineFunction
contains possibly multiple MachineBasicBlock
instances, and machine basic blocks contain instances of MachineInstruction
. These three types have different relationships with their IR counterparts. A MachineFunction
always represents an IR Function
while the MachineBasicBlock
might map a BasicBlock but it does not have to! That’s why getFunction()
returns a reference and getBasicBlock()
returns a pointer which might be null. Finally, the machine instruction does not correspond directly to IR Instruction
at all.
But why is there a one-to-many mapping in basic blocks? The example below explains how the abstracted view of the machine in the IR can’t match perfectly the actual assembly.
This code is compiled in clang with flags -O3 -fno-vectorize -fno-unroll-loops
to keep the code simple; loop vectorization and unrolling will create additional machine basic blocks as well but the representation is harder to understand.
; Function Attrs: norecurse nounwind readonly uwtable
define dso_local double @_Z1fPdi(double* nocapture readonly, i32) local_unnamed_addr {
%3 = icmp sgt i32 %1, 0
br i1 %3, label %4, label %6
; <label>:4: ; preds = %2
%5 = zext i32 %1 to i64
br label %8
; <label>:6: ; preds = %8, %2
%7 = phi double [ 1.000000e+00, %2 ], [ %13, %8 ]
ret double %7
; <label>:8: ; preds = %8, %4
%9 = phi i64 [ 0, %4 ], [ %14, %8 ]
%10 = phi double [ 1.000000e+00, %4 ], [ %13, %8 ]
%11 = getelementptr inbounds double, double* %0, i64 %9
%12 = load double, double* %11, align 8
%13 = fadd double %10, %12
%14 = add nuw nsw i64 %9, 1
%15 = icmp eq i64 %14, %5
br i1 %15, label %6, label %8
}
This is the most basic representation of our code - if the second parameter length
is greater than zero, the control flow reaches loop preheader with label 4
and then loop with label 8
where the actual computation is performed. The exit block 6
defines the final result returned from the function with a phi node
which selects either a constant 1 or the loop result, depending on whether the loop has executed at least one iteration. PHI nodes are a very fundamental concepts in the SSA representation but they don’t exist in hardware. Therefore, in this case the basic block is duplicated as we can see on the listing below.
# Machine code for function _Z1fPdi: NoPHIs, TracksLiveness, NoVRegs
Constant Pool:
cp#0: 1.000000e+00, align=8
Function Live Ins: $rdi, $esi
bb.0 (%ir-block.2):
successors: %bb.3(0x50000000), %bb.1(0x30000000); %bb.3(62.50%), %bb.1(37.50%)
liveins: $esi, $rdi
TEST32rr renamable $esi, renamable $esi, implicit-def $eflags
JLE_1 %bb.1, implicit $eflags
bb.3 (%ir-block.4):
; predecessors: %bb.0
successors: %bb.4(0x80000000); %bb.4(100.00%)
liveins: $esi, $rdi
renamable $eax = MOV32rr killed renamable $esi, implicit-def $rax
renamable $xmm0 = MOVSDrm $rip, 1, $noreg, %const.0, $noreg :: (load 8 from constant-pool)
renamable $ecx = XOR32rr undef $ecx(tied-def 0), undef $ecx, implicit-def dead $eflags, implicit-def $rcx
bb.4 (%ir-block.8, align 4):
; predecessors: %bb.3, %bb.4
successors: %bb.2(0x04000000), %bb.4(0x7c000000); %bb.2(3.12%), %bb.4(96.88%)
liveins: $rax, $rcx, $rdi, $xmm0
renamable $xmm0 = ADDSDrm killed renamable $xmm0(tied-def 0), renamable $rdi, 8, renamable $rcx, 0, $noreg :: (load 8 from %ir.scevgep, !tbaa !2)
renamable $rcx = nuw nsw ADD64ri8 killed renamable $rcx(tied-def 0), 1, implicit-def dead $eflags
CMP64rr renamable $rax, renamable $rcx, implicit-def $eflags
JNE_1 %bb.4, implicit $eflags
bb.2 (%ir-block.6):
; predecessors: %bb.4
liveins: $xmm0
RETQ $xmm0
bb.1:
; predecessors: %bb.0
renamable $xmm0 = MOVSDrm $rip, 1, $noreg, %const.0, $noreg :: (load 8 from constant-pool)
RETQ $xmm0
# End machine code for function _Z1fPdi.
The exit block from IR corresponds to the machine basic block bb.2
which returns the value in vector register xmm0
. Additionally, a second exit block bb.1
is provided for the case when loop is not executed at all. A similar pattern can be seen in the assembly for x86 target where the new block moves the constant value to the register before executing retq
.
From an LLVM bitcode, we can go directly generate object file which in our case is stored as an x86-64 ELF format, by using the LLVM static compiler llc
llc -filetype=obj file.bc -o file.o
with the option -filetype=obj
. One can also generate an assembly listing
llc -S file.bc -o file.s
llvm-mc -filetype=obj file.bc -o file.o
Additionaly, the flag -asm-show-inst
generates the listing with mapping to LLVM machine instructions and operands in comments. When building C/C++ programs, these steps are wrapped inside clang
tool and hidden from the user 6.
The well-documented and properly explained infrastracture ends as soon as we try to write and run passes on machine code representation. It sounds from this November 2015 discussion on the mailing list that there’s no way of running the pass dynamically during target code generation. llc docs explain that the -load
option, known also in opt, is used to dynamically load targets, not passes. For the purpose of this tutorial, we want to write a pass printing machine blocks and basic statistics on size and correspondence to IR block. llc
already offers an ability to print out machine instructions after every optimization step by using the flag -print-machineinstrs
. The contents of this tutorial are mostly based on the information found on LLVM mailing list7.
The code generation operates in several phases which are responsible for selecting instructions, allocating registers. performing late optimizations such as peephole optimizations and emitting the code8. We want to run our pass quite late, just after target optimizations and before the lowering of machine instructions to target code. Since there’s no way to dynamically load the pass, we need to statically embed our pass in X86 target generation chain.
First, we implement our machine pass that visits functions in the code. As the naming scheme suggests, a machine counterpart of a FunctionPass is simply a MachineFunctionPass. Only a single function runOnFunction
needs to be implemented . Since we want to print a basic information on block size, it might look like this:
We use the macro INITIALIZE_PASS
as a first step of pass registration. It defines a function initializePASSNAMEPass
in namespace llvm
and we need to provide an additional declaration. Furthermore, we need to supply a function returning a dynamically allocated pass object. Both function are used by the target configuration to add our pass to the pipeline.
The we need a forward declaration of the create function in lib/Target/X86/X86.h
namespace llvm {
...
FunctionPass *createMachinePrinter();
}
Forward decalarations allow compiling the target configuration independently from passes implementation. Thus, passes can be developed and improved without a significant project rebuild. Now we need to insert our analysis pass into X86 target configuration. We start in lib/Target/X86/X86TargetMachine.cpp
with a forward declaration and a call to initialization function in lib/Target/X86/X86TargetMachine.cpp
.
namespace llvm {
void initializeMachinePrinterPass(PassRegistry &);
}
extern "C" void LLVMInitializeX86Target() {
PassRegistry &PR = *PassRegistry::getPassRegistry();
...
initializeMachinePrinterPass(PR);
}
Since I want to look at machine code after all implementations have been finished, the pass is called just at the very end:
void X86PassConfig::addPreEmitPass2() {
...
addPass(createMachinePrinter());
}
As the last step we modify the CMakeLists.txt
in lib/Target/X86/CMakeLists.txt
to add the new .cpp file to sources
list. The new pass should be available when using llc to build x86 targets.
As we see, this process is not complicated and one can set up a machine pass prototype in just a few minutes. It’s a pity that this process is not so well documented and supported as IR passes. And as far as machine basic block statistics are concerned, this method is reliable only when static information is concerned. Dynamic block counts need profiling information that is available on the IR level, and the one-to-many mapping between IR and machine basic block prevent us from obtaining a complete and accurate result. For that purpose one needs to use other instrumentation tool, for example the Pin
tool which is capable of counting basic block instances.
-
SSA on Wikipedia, a free textbook on SSA provided by Inria - work in progress! ↩
-
Life of an instruction LLVM, Eli Bendersky ↩
-
January 2018 mailing list discussion explains the differences between building an application with various pipelines based on clang, opt and llc. ↩
-
November 2015 discussion on writing a machine pass, April 2014 discussion which brings the problem of finding correspondence between IR and machine instructions. The mentioned
-debug-ir
was deprecated and removed in November 2014. As of October 2018, there’s an open review for a new debug pass. July 2016 question on matching machine instructions with IR, no answers unfortunately. ↩
Enjoy Reading This Article?
Here are some more articles you might like to read next: