Undefined Behavior in LLVM

UndefinedBehaviorinLLVM
JohnRegehr
Trust-in-So:/UniversityofUtah
•  sqrt(-1)=?
–  i
–  NaN
–  Arbitraryvalue
–  ExcepLon
–  Undefinedbehavior
•  Undefinedbehavior(UB)isadesignchoice
–  SystemdesignersuseUBwhentheydon’tfeellike
commiQng(orcan’tcommit)toanyparLcular
semanLcs
Undefinedbehavioris
undefined
•  Technically,anythingcan
happennext
–  “Permissibleundefined
behaviorrangesfromignoring
thesituaLoncompletelywith
unpredictableresults,to
havingdemonsflyoutofyour
nose.”
•  InpracLce,UBis
implementedlazily:by
assumingitwillnever
happen
(imagefrom@whitequark)
(imagefromEvilTeach
onStackoverflow)
Commonconsequencesinclude…
•  Predictableandusefulresultononepla^orm,
differentresultonanotherpla^orm
•  Unpredictableornonsensicalresult
•  MemorycorrupLon
•  RemotecodeexecuLon
•  Traporfault
•  Noconsequencesatall
•  AVR32(embeddedCPU):
•  SchemeR6RS:
•  C/C++havetonsandtonsofundefined
behaviors
–  dividebyzero,useofdanglingpointer,shi:past
bitwidth,signedintegeroverflow,…
•  LLVMhasundefinedbehaviortoo
int foo (int x) {
return (x + 1) > x;
}
int main () {
printf("%d\n", (INT_MAX + 1) > INT_MAX);
printf("%d\n", foo(INT_MAX));
return 0;
}
$ gcc -O2 intmax-overflow.c ; ./a.out
0
1
int main() {
int *p = (int*)malloc(sizeof(int));
int *q = (int*)realloc(p, sizeof(int));
*p = 1;
*q = 2;
if (p == q)
printf("%d %d\n", *p, *q);
}
$ clang -O realloc.c ; ./a.out
1 2
Without-DDEBUG
void foo(char *p) {
#ifdef DEBUG
printf("%s\n", p);
#endif
if (p != 0)
bar(p);
}
_foo:
testq
je
jmp
L1: ret
%rdi, %rdi
L1
_bar
With-DDEBUG
void foo(char *p) {
#ifdef DEBUG
printf("%s\n", p);
#endif
if (p != 0)
bar(p);
}
_foo:
pushq
movq
call
movq
popq
jmp
%rbx
%rdi, %rbx
_puts
%rbx, %rdi
%rbx
_bar
Asdevelopers,whatcandoweaboutundefined
behaviorinCandC++?
•  Onlyusetheselanguagesappropriately
•  Usemoderncodingstyle
•  Dynamictools
–  UBSan,ASan,Valgrind
–  Andtestlikecrazy,usefuzzers,etc.
•  StaLcanalysistools
–  Enableandheedcompilerwarnings
–  Lotsmore
FactsAboutUBinLLVM
•  ItexiststosupportgeneraLonofgoodcode
•  Itisindependentofundefinedbehaviorin
sourceortargetlanguages
–  YoucancompileanUB-freelanguagetoLLVM
•  Itcomesinseveralflavors
•  ReasoningaboutopLmizaLonsinthe
presenceofUBisverydifficult
•  Compilerstransformsourceprogramsto
targetprogramsinaseriesofsteps,e.g.
–  Swi:èSIL
–  SILèLLVM
–  LLVMèARMv8
•  Ateachstep
–  OKtoremoveUB
–  MustnotaddUB
–  Thisisrefinement
•  Example:Shi:instrucLonsaredefinedfor
shi:spastbitwidth
–  Butdifferentprocessorsdefineitdifferently
LLVMhasthreekindsofUB
1.  Undef
–  ExplicitvalueintheIR
–  Actslikeafree-floaLnghardwareregister
•  Takesallpossiblebitpakernsatthespecifiedwidth
•  CantakeadifferentvalueeveryLmeitisused
–  ComesfromuniniLalizedvariables
–  Furtherreading
•  hkp://sunfishcode.github.io/blog/2014/07/14/undefintroducLon.html
•  WewantthisopLmizaLon:
%add = add nsw i32 %a, %b
%cmp = icmp sgt i32 %add, %a
=>
%cmp = icmp sgt i32 %b, 0
•  Butundefdoesn’tletusdoit:
%add = add nsw i32 %INT_MAX, %1
%cmp = icmp sgt i32 undef, %INT_MAX
•  There’snobitpakernwecansubsLtutefor
theundefthatmakes%cmp=true
LLVMhasthreekindsofUB
2.  Poison
–  EphemeraleffectofmathinstrucLonsthatviolate
•  nsw–nosignedwrapforadd,sub,mul,shl
•  nuw–nounsignedwrapforadd,sub,mul,shl
•  exact–noremainderforsdiv,udiv,lshr,ashr
–  DesignedtosupportspeculaLveexecuLonof
operaLonsthatmightoverflow
–  PoisonpropagatesviainstrucLonresults
–  Ifpoisonreachesaside-effecLnginstrucLon,the
resultistrueUB
LLVMhasthreekindsofUB
3.  Trueundefinedbehavior
–  Triggeredby
•  Dividebyzero
•  Illegalmemoryaccesses
–  Anythingcanhappenasaresult
•  TypicallyresultsincorruptedexecuLonoraprocessor
excepLon
•  WhichofthesetransformaLonsisOK?
%result = add nsw i32 %a, %b!
=>!
%result = add i32 %a, %b
I’mOK
%result = add i32 %a, %b!
=>!
%result = add nsw i32 %a, %b
•  UseAlivetodoautomatedproofsaboutLLVMpeephole
opLmizaLons:
–  hkps://github.com/nunoplopes/alive
–  AliveunderstandsallthreekindsofUB
$ ./alive.py add.opt
---------------------------------------Optimization: 1
Precondition: true
%result = add nsw i32 %a, %b
=>
%result = add i32 %a, %b
Done: 1
Optimization is correct!
$ ./alive.py add-bad.opt
---------------------------------------Optimization: 1
Precondition: true
%result = add i32 %a, %b
=>
%result = add nsw i32 %a, %b
ERROR: Domain of poisoness of Target is smaller
than Source's for i32 %result
Example:
%a i32 = 0x7FFFEFFF (2147479551)
%b i32 = 0x7FFFFBFF (2147482623)
Source value: 0xFFFFEBFE (4294962174, -5122)
Target value: poison
•  WetranslatedabunchofInstCombine
pakernsintoAlive
–  Foundsomewrongones,reportedbugs
–  FoundsomemissedopportuniLestopreserveUB
flags(nsw,nuw,exact)
•  Detailscanbefoundinapaper
–  hkp://www.cs.utah.edu/~regehr/papers/
pldi15.pdf
•  PleasetryoutAliveifyoureasonabout
peepholeopLmizaLonsinLLVM
ConflicLngdesigngoalsforLLVMUB
1.  EnableallopLmizaLonsthatwewantto
perform
2.  Beinternallyconsistent
3.  BeconsistentwiththeLLVMimplementaLon
Thecurrentschemegenerallyworksfine
•  Butit’snotclearthatitactuallymeetsanyof
thesethreegoals
•  NunoLopesisheadinganefforttorework
poisonandundef
–  Currentlytheyare(wethink)unnecessarily
complicated
–  Goalistomakeundefabitstrongeranddrop
poisonenLrely
–  Nochangeto“trueUB”
•  Othercompilers(GCC,Microso:)havesimilar
UB-relatedconcepts
–  DetailedspecificaLonsarehardtofind
–  SamemoLvaLon:supportefficientcodegen
Thanks!