Data-flow analyzers in mbeddr

 

Data-flow analysis derives information about the dynamic behavior of a program by only examining the static code. The derived information consists of, for example, (1) the set of uninitialized variables at a given point in the program, (2) the possible targets of a pointer or (3) liveness of assignments to see if the assigned variable is used at a later point in the program. A clever compiler can use this information to produce better code, where better may mean faster, safer, more robust or even more energy efficient.

In mbeddr, the above described analyses are executed immediately on the typed-in code, which helps you to reason about the program code and also to better understand what may be going on during the actual execution. We have implemented these analyses using the powerful data-flow facilities of MPS and tweaked them a little bit with additional language extensions. This blog post aims at giving a glimpse of what these analyzers can do. If you are more interested in the technical details, please refer to this presentation: https://prezi.com/_mtayndadtv_/data-flow-analysis (this was used to guide an in-house session on the topic) or to http://www.itu.dk/people/brabrand/UFPE/Data-Flow-Analysis/static.pdf.

Points-to analysis


The analysis is responsible for computing the set of targets that a given pointer variable can point to. It is often a building block for further analysis and, in mbeddr, we use it in the uninitialized read analyzer to derive more precise results.

Liveness

A variable is said to be live at a given program point, if its value may be read during the remaining execution of the program. The liveness analyzer gives hints about assignments which are useless, because the assigned variable does not appear later on in the right hand side of another assignment.

In the example code snippet the last assignment to z is underlined and a warning is given, because (1) at the next iteration of the loop we assign a new value to z before we would use the old value or (2) we exit the loop and z is not used afterwards.

Uninitialized read


An uninitialized variable is a variable which is declared but is not set to a definite known value before it is used. This is a major source of program errors in all programming languages and C is not an exception, in fact, pointers just complicate the problem even further because uninitialized pointer variables just point to some junk in the memory. Dereferencing these memory addresses can cause serious flaws in the program.

mbeddr comes with an analyzer to check for reads from uninitialized variables. According to the classification among data-flow analyzers, the current implementation is a flow-sensitive, path-insensitive and context-insensitive analyzer;
  • flow-sensitive: it takes into account the order of the statements in your C function.
  • path-insensitive: we do not perform any kind of interpretation on conditional branches, loops, etc so the analyzer doesn't know about this information.
  • context-insensitive: all functions are analyzed in separation and we do not follow function calls, nor rely on context information. This is less precise than a context-sensitive analyzer, but it was a design decision to implement it that way, because we execute the analyzer in an interactive fashion after editing the source code. Performing a full context-sensitive analysis would require a longer time and would break the editing flow in the IDE. In order to compensate for the lost precision, we have extended the core C language of mbeddr with annotations on function arguments.

Improving the precision of the analyzer

Two extensions are used to improve the precision of the analyzer and make up for the precision that may be lost due to the context-insensitive nature.
(1) The analyzer uses the results of the points-to analysis to get accurate information about pointer mappings.
(2) The user can annotate the function arguments with the in/out keywords which have the following semantics:
An IN argument must be initialized at the time when the function is called and cannot be modified in a way that has an effect outside of the function. This means that if the argument has a pointer type, it cannot be dereferenced and assigned inside the function.
An OUT argument must be initialized at the time when the function is called and must be modified in a way that has an effect outside of the function. This means that an OUT annotation can only be used for pointer types and the argument must be dereferenced and assigned inside the function body. These semantics are enforced while editing the code and error markers are used on the offending arguments if they do not fulfil them.

Examples

This is the basic example of reading from an uninitialized variable (a). The IDE marks the read as an error.
We create a pointer b, which points to the address of a. Then, we dereference b and assign a value to it. After this point, a can be safely read, its value is initialized. The analyzer relies on the results of the points-to analysis to provide accurate results.
We pass the address of a to the function call2. The argument in call2 is marked as OUT, which means there must be a write to it which is visible to the outside. Doing otherwise would yield an error in the call2 function. In the call1 function we rely on this information and know that after the function call we can use the value of a because it is initialized.

The topic of analyzers is a rather new feature in mbeddr so feel free to try it out and play around with it. On the other hand, these analyzers will always compute only an approximation of the program properties. Ultimately, our goal is to rather provide a smaller number of true positive examples for probable bugs, then to underline everything in the source code due to false positives.