COP 5021 meeting -*- Outline -*- * Introduction ** Who introduce self and meet everyone have them write names on board ------------------------------------------ Instructor: Gary T. Leavens HEC 329 Leavens@ucf.edu 407-823-4758 handouts: course policies and HW 0 ------------------------------------------ Use the ucf.edu email or webcourses for questions ** what is program analysis? Q: What's the different between static and dynamic properties? ------------------------------------------ COP 5021 PROGRAM ANALYSIS WHAT IS PROGRAM ANALYSIS? Def: *program analysis* is ------------------------------------------ ... predicting statically safe approximations to the set of configurations or behaviors that may occur dynamically. Q: How does this differ from (human) code inspection? Automated, can handle larger more complex code bases Q: How is it different than testing? Runtime assertion checking? done without running the program, takes all possible executions into account so errors found might not be real (due to approximation), but if the program is certified, then it has the given property ** why study program analysis (course spec) ------------------------------------------ WHY PROGRAM ANALYSIS? Automatic understanding of programs is - important for: + optimizing compilers + program development tools + formal verification: - safety critical systems - business critical systems + computer security - finding vulnerabilities - assurance for critical systems + research in programming languages - impossible, in general + safe approximations e.g., read(x); (if x > 0 then y:= 1 else {y:= 2; f()}); z:=y -- can we say that z is 1 at the end? Basic ideas: - compute abstractions - use in transformations Goals: - little or no input from programmers - correctness - efficient (at compile time): - time - space ------------------------------------------ no input ==> practical, usable correctness ==> usable "under the covers" Program analysis encompasses the core areas in the theory of programming language research, including type systems and program optimization and reasoning. Generally speaking, the bias is towards having no programmer input, which tends to lead to fairly global (whole-program) analyses. Q: What's not a goal? modularity (whole program analyses are often accepted) simplicity (sometimes very mathematically sophisticated) *** main ideas ------------------------------------------ MAIN IDEAS OR THEMES - conservatism: "Err on the safe side!" - efficiency from approximation: "Trade precision for efficiency!" ------------------------------------------ Q: What does safety mean? nothing bad happens If you say no errors, there can't be any at runtime Example: weather forecasting of rain. Q: What does it mean to be conservative when taking money for a trip? take more than you think you'll need Q: What's an example of the first idea from type checking? If the compiler says there are no type errors, then there are none Q: What's type safety? Lack of runtime type errors But: what's a type error... (we'll get into that) Go over the powerpoint about approximation (see approximation.pptx) now Q: Suppose we're interested in numerical precision (error estimation), what's an example of the second idea in this case? Let the anaysis be more approximate if that makes it faster. ------------------------------------------ PRECISION AND RECALL def: the *precision* of an analysis is the fraction of def: the *recall* of an analysis is the fraction of Example: Suppose a program has 10 vulnerabilities and a tool identifies 8 places but only 6 of those are actual ones The precision is recall is ------------------------------------------ ... identified problems that are actually problems ... the actual problems that are found ... 6/8 (true positives/number positives output) ... 6/10 (true positives/possible outputs) Q: What is the goal for precision? For recall? ideally would like both precision and recall to be 1 Q: Can we do that? No, that is impossible in general for program analysis Q: Which is worse for analysis of security vulnerabilities: poor precision or poor recall? Poor recall is certainly bad, means that you miss vulnerabilities But poor precision is also bad, means that people won't pay attention Security is an exception in that precision tends to be a requirement, and people are willing to trade poor recall for good precision. Imagine a virus checker, people stop using it if it gives many false positives (more than a few). *** practicality These ideas are the heart of many compilers and language systems e.g., abstract interpretation used in verifying Airbus software (the Astree tool) type systems and other static analysis in JML compiler taint checking in Perl and ruby *** widely used Many papers assume one understands these ideas Lots of different applications including: - database query optimization - memory allocation checking for embedded systems - checking byte code on the JVM - alias (points-to) analysis, to allow optimizations - side effects or purity - field accesses - potential for variables to be null - array indexes out of bounds - uninitialized variable accesses - deadlock prevention - race conditions - security - vulnerability checking (like flawfinder) - information flow analysis - disassembly/program understanding - assurrance *** other interest - relation to operational semantics - connections between the different kinds of analysis are interesting, and provide a unifying set of ideas Q: what about the material interests you? ** Plan of course (syllabus) - overview, survey - basic use of Visual Studio extensions - dataflow analysis, - implementation of dataflow using Visual Studio - structural operational semantics and correctness - applications (e.g., to security) if time: - abstract interpretation - type and effect systems summary and review at the end Q: Would you make any changes to the plan? ** Objectives *** meta - get you to think critically Q: What kind of questions should you be asking? limitations? utility? - teach you some semantics and formal methods *** normal In one sentence, the main objective is that you will be able to use and apply the principles of program analysis in the construction of software tools. Focus on procedural, sequential programs (WHILE language), but will extend to other areas, styles of programs in projects. ------------------------------------------ OBJECTIVES - [Ideas] Correctly understand and use terms for reading/writing papers when designing software tools - [ImproveTools] Effectively apply the concepts to design better software tools, programming languages and tools. ------------------------------------------ See the course's about page for details, including outcomes ** How I'll run the course *** overview informal and friendly lecture meetings: discuss homework, if any (show program examples, or on board) discuss next topic (working examples) You need to read ahead or at least keep up in the reading. homework: explore the material, perhaps generalize or apply it (esp. to OOP, AOP, components, security, etc.) Can work alone or with others. Recommend groups for the project. grading: based on evidence, participation, project, exams. we'll give comments and grades on homework pace: we'll try to uncover and explore carefully want deep understanding of that material (semantics) for homework, we'll be flexible, *** red tape prerequisites: COP 4020 and COT 4210 book: Principles of Program Analysis, by Flemming Nielson, Hanne Riis Nielson, and Chris Hankin (Springer-Verlag, 1999, corrected printing 2005). ISBN 3-540-65410-0. ** summary Q: any other questions about the course? ** task ------------------------------------------ YOUR TASK READ THE BOOK! See the readings in the syllabus Goal: understand the material, so ask questions! ------------------------------------------