Guidelines for developing safety-critical systems
Development practices according to V-model
The V-model defines three parts of the project:
- Design
- Implementation
- Verification
Particular norms, such as EN50128 for Railway industry, lay out the recommended practices within each of those parts, with the aim of increasing safety of the future product. In EN50128 these practices have been organized in tables, which include recommendations for specific SIL (Safety Integrity Level) levels. Additionally, all techniques have been described individually.
Software Design
Designing should be performed in a structured way (Structured methodology). The system boundaries should be defined, requirements should be identified, and the whole project should be incrementally detailed by addressing bigger problems and splitting them into smaller ones. What is helpful in this process are: checklists, diagrams, state machines, and computer tools. Furthermore, informal methods may also be used to make the process easier and more understandable. Some of those design schemes will be later included in the official documentation. Decisions should be consulted with the whole group during the Design Review.
The norm specifies also those more technical aspects to consider, such as:
- Graceful Degradation — limiting the error output in one part of the system in order to minimize their influence over the remaining parts;
- Diverse Programming — implementing the same functionalities multiple times, i.e. on different processors, in order to immunize the process to particular error categories.
What is more, some practices are clearly discouraged by the norm, a good example being a dynamic change in configuration while the program is running — instead, the change should be performed in the initialization stage, allowing the system to work in accordance with the new setting. Similarly, it is a mistake to restore the system to its previous working state after a fatal error occurs. In such a case we should limit its impact by entering the safe-state procedure and wait for the operator’s intervention.
Implementation
Another set of guidelines included in the EN50128 norm involves implementation. To begin with, it states that compiled and strongly typed languages should be used, to prevent as many dangerous constructs as possible from compiling. Those that do compile nonetheless should be described in the Coding Standard document and intercepted by means of static analysis. This is especially important in the cases of C and C++ languages, as they accept many questionable constructs. A widely used standard in this regard is the MISRA C.
Other recommendations indicate that global variables should not be used; functions should be of a limited length, complexity, and input parameters number; modules should have a size limit and focus on a single task; the system should be separated into multiple abstraction layers, while its modules and layers should be interdependent only in a limited fashion. All of these rules are widely accepted guidelines for producing high-quality code and are utilized both in and outside of safety-critical systems. Yet one can find more restrictive recommendations, which are not typically implemented in conventional solutions, such as:
- the restriction not to use dynamic allocation — the safety-critical systems may function for over 10 years continuously and memory fragmentation could render them inoperable;
- the restriction not to use recurrence — it can lead to stack overflow;
- limitation in utilizing pointers — pointers can lead to overwriting foreign data or execution of a wrong code, as may be the case when function pointers are concerned.
Verification and testing
The third advice collection considers verification and testing. It states that tests should be performed on various levels — unit, integration, system, as well as the non-functional ones, such as performance. These should be both white- and black-box tests. The norm also recommends monitoring of code coverage, Modified Condition/Decision Coverage (MCDC) if possible, supporting the process with metrics, static analysis, dynamic analysis, and Code Review.
As we can see, the norms contain many good practices well known in other programming areas. Obviously, they have been well documented, analyzed, and repeatedly proved their effectiveness. On the other hand, there are techniques used specifically in the safety-critical systems, as those systems need to be developed under particular scrutiny.
Read more about our experience in safety-critical systems development.
Interested in similar solutions? Contact us!
Written by: Maciej Gajdzica, Senior Software Developer with unique experience in building life&death systems at Solwit SA