UndefinedBehaviorinLLVM
JohnRegehr
Trust-in-So:/UniversityofUtah
• sqrt(-1)=?
– i
– NaN
– Arbitraryvalue
– ExcepLon
– Undefinedbehavior
• Undefinedbehavior(UB)isadesignchoice
– SystemdesignersuseUBwhentheydon’tfeellike
commiQng(orcan’tcommit)toanyparLcular
semanLcs
Undefinedbehavioris
undefined
• Technically,anythingcan
happennext
– “Permissibleundefined
behaviorrangesfromignoring
thesituaLoncompletelywith
unpredictableresults,to
havingdemonsflyoutofyour
nose.”
• InpracLce,UBis
implementedlazily:by
assumingitwillnever
happen
(imagefrom@whitequark)
(imagefromEvilTeach
onStackoverflow)
Commonconsequencesinclude…
• Predictableandusefulresultononepla^orm,
differentresultonanotherpla^orm
• Unpredictableornonsensicalresult
• MemorycorrupLon
• RemotecodeexecuLon
• Traporfault
• Noconsequencesatall
• AVR32(embeddedCPU):
• SchemeR6RS:
• C/C++havetonsandtonsofundefined
behaviors
– dividebyzero,useofdanglingpointer,shi:past
bitwidth,signedintegeroverflow,…
• LLVMhasundefinedbehaviortoo
int foo (int x) {
return (x + 1) > x;
}
int main () {
printf("%d\n", (INT_MAX + 1) > INT_MAX);
printf("%d\n", foo(INT_MAX));
return 0;
}
$ gcc -O2 intmax-overflow.c ; ./a.out
0
1
int main() {
int *p = (int*)malloc(sizeof(int));
int *q = (int*)realloc(p, sizeof(int));
*p = 1;
*q = 2;
if (p == q)
printf("%d %d\n", *p, *q);
}
$ clang -O realloc.c ; ./a.out
1 2
Without-DDEBUG
void foo(char *p) {
#ifdef DEBUG
printf("%s\n", p);
#endif
if (p != 0)
bar(p);
}
_foo:
testq
je
jmp
L1: ret
%rdi, %rdi
L1
_bar
With-DDEBUG
void foo(char *p) {
#ifdef DEBUG
printf("%s\n", p);
#endif
if (p != 0)
bar(p);
}
_foo:
pushq
movq
call
movq
popq
jmp
%rbx
%rdi, %rbx
_puts
%rbx, %rdi
%rbx
_bar
Asdevelopers,whatcandoweaboutundefined
behaviorinCandC++?
• Onlyusetheselanguagesappropriately
• Usemoderncodingstyle
• Dynamictools
– UBSan,ASan,Valgrind
– Andtestlikecrazy,usefuzzers,etc.
• StaLcanalysistools
– Enableandheedcompilerwarnings
– Lotsmore
FactsAboutUBinLLVM
• ItexiststosupportgeneraLonofgoodcode
• Itisindependentofundefinedbehaviorin
sourceortargetlanguages
– YoucancompileanUB-freelanguagetoLLVM
• Itcomesinseveralflavors
• ReasoningaboutopLmizaLonsinthe
presenceofUBisverydifficult
• Compilerstransformsourceprogramsto
targetprogramsinaseriesofsteps,e.g.
– Swi:èSIL
– SILèLLVM
– LLVMèARMv8
• Ateachstep
– OKtoremoveUB
– MustnotaddUB
– Thisisrefinement
• Example:Shi:instrucLonsaredefinedfor
shi:spastbitwidth
– Butdifferentprocessorsdefineitdifferently
LLVMhasthreekindsofUB
1. Undef
– ExplicitvalueintheIR
– Actslikeafree-floaLnghardwareregister
• Takesallpossiblebitpakernsatthespecifiedwidth
• CantakeadifferentvalueeveryLmeitisused
– ComesfromuniniLalizedvariables
– Furtherreading
• hkp://sunfishcode.github.io/blog/2014/07/14/undefintroducLon.html
• WewantthisopLmizaLon:
%add = add nsw i32 %a, %b
%cmp = icmp sgt i32 %add, %a
=>
%cmp = icmp sgt i32 %b, 0
• Butundefdoesn’tletusdoit:
%add = add nsw i32 %INT_MAX, %1
%cmp = icmp sgt i32 undef, %INT_MAX
• There’snobitpakernwecansubsLtutefor
theundefthatmakes%cmp=true
LLVMhasthreekindsofUB
2. Poison
– EphemeraleffectofmathinstrucLonsthatviolate
• nsw–nosignedwrapforadd,sub,mul,shl
• nuw–nounsignedwrapforadd,sub,mul,shl
• exact–noremainderforsdiv,udiv,lshr,ashr
– DesignedtosupportspeculaLveexecuLonof
operaLonsthatmightoverflow
– PoisonpropagatesviainstrucLonresults
– Ifpoisonreachesaside-effecLnginstrucLon,the
resultistrueUB
LLVMhasthreekindsofUB
3. Trueundefinedbehavior
– Triggeredby
• Dividebyzero
• Illegalmemoryaccesses
– Anythingcanhappenasaresult
• TypicallyresultsincorruptedexecuLonoraprocessor
excepLon
• WhichofthesetransformaLonsisOK?
%result = add nsw i32 %a, %b!
=>!
%result = add i32 %a, %b
I’mOK
%result = add i32 %a, %b!
=>!
%result = add nsw i32 %a, %b
• UseAlivetodoautomatedproofsaboutLLVMpeephole
opLmizaLons:
– hkps://github.com/nunoplopes/alive
– AliveunderstandsallthreekindsofUB
$ ./alive.py add.opt
---------------------------------------Optimization: 1
Precondition: true
%result = add nsw i32 %a, %b
=>
%result = add i32 %a, %b
Done: 1
Optimization is correct!
$ ./alive.py add-bad.opt
---------------------------------------Optimization: 1
Precondition: true
%result = add i32 %a, %b
=>
%result = add nsw i32 %a, %b
ERROR: Domain of poisoness of Target is smaller
than Source's for i32 %result
Example:
%a i32 = 0x7FFFEFFF (2147479551)
%b i32 = 0x7FFFFBFF (2147482623)
Source value: 0xFFFFEBFE (4294962174, -5122)
Target value: poison
• WetranslatedabunchofInstCombine
pakernsintoAlive
– Foundsomewrongones,reportedbugs
– FoundsomemissedopportuniLestopreserveUB
flags(nsw,nuw,exact)
• Detailscanbefoundinapaper
– hkp://www.cs.utah.edu/~regehr/papers/
pldi15.pdf
• PleasetryoutAliveifyoureasonabout
peepholeopLmizaLonsinLLVM
ConflicLngdesigngoalsforLLVMUB
1. EnableallopLmizaLonsthatwewantto
perform
2. Beinternallyconsistent
3. BeconsistentwiththeLLVMimplementaLon
Thecurrentschemegenerallyworksfine
• Butit’snotclearthatitactuallymeetsanyof
thesethreegoals
• NunoLopesisheadinganefforttorework
poisonandundef
– Currentlytheyare(wethink)unnecessarily
complicated
– Goalistomakeundefabitstrongeranddrop
poisonenLrely
– Nochangeto“trueUB”
• Othercompilers(GCC,Microso:)havesimilar
UB-relatedconcepts
– DetailedspecificaLonsarehardtofind
– SamemoLvaLon:supportefficientcodegen
Thanks!
© Copyright 2026 Paperzz