Source lines of code



         


Source lines of code (SLOC) is a software metric used to measure the amount of code in a software program. SLOC is typically used to estimate the amount of effort that will be required to develop a program, as well as to estimate productivity or effort once the software is produced.

[Top]

Measuring SLOC

There are two major types of SLOC measures: physical SLOC and logical SLOC. Specific definitions of these two measures vary, but the most common definition of physical SLOC is a count of "non-blank, non-comment lines" in the text of the program's source code. Logical SLOC measures attempt to measure the number of "statements", but their specific definitions are tied to specific computer languages (one simple logical SLOC measure for C-like languages is the number of statement-terminating semicolons). It is much easier to create tools that measure physical SLOC, and physical SLOC definitions are easier to explain. However, physical SLOC measures are sensitive to logically irrelevant formatting and style conventions, while logical SLOC is less sensitive to formatting and style conventions. Unfortunately, SLOC measures are often stated without giving their definition, and logical SLOC can often be significantly different than physical SLOC.

Even the "logical" and "physical" SLOC values can have a large number of varying definitions. Robert E. Park et al. developed a framework for defining SLOC values, to enable people to carefully explain and define the SLOC measure used in a project. For example, most software systems reuse code, and determining which (if any) reused code to include is important when reporting a measure.

[Top]

Usage of SLOC measures

SLOC measures are somewhat controversial, particularly in the way that they are sometimes misused. Experiments have repeatedly confirmed that effort is highly correlated with SLOC, that is, programs with larger SLOC values take more time to develop. Thus, SLOC can be very effective in estimating effort. However, functionality is less well correlated with SLOC: skilled developers may be able to develop the same functionality with far less code, so one program with less SLOC may exhibit more functionality than another similar program. In particular, SLOC is a poor productivity measure of individuals, since a developer can develop only a few lines and yet be far more productive than a developer creating lines of code for the sake of creating a larger number. Good developers may merge code modules into single module, improving the system yet appearing to have negative productivity because they remove code, and skilled developers tend to be assigned the most difficult tasks (that is why they are paid the big bucks).

There are several cost, schedule, and effort estimation models which use SLOC as an input parameter, including the widely-used COnstructive COst MOdel (COCOMO) series of models by Barry Boehm et al. While these models have shown good predictive power, they are only as good as the estimates (particularly the SLOC estimates) fed to them. Many have advocated the use of function points instead of SLOC as a measure of functionality, but since function points are highly correlated to SLOC (and cannot be automatically measured) this is not a universally held view.

According to Gary McGraw, the SLOC values for various versions of Microsoft's Windows operating system are as follows (estimating from his graph; he does not specify if these are physical or logical measures or a mixture):


Year Operating System SLOC (Million)
1990 Windows 3.1 3
1995 Windows NT 4
1997 Windows 95 15
1998 Windows NT 4.0 16
1999 Windows 98 18
2000 Windows NT 5.0 20
2001 Windows 2000 35
2002 Windows XP 40


David A. Wheeler studied the Red Hat distribution of the GNU/Linux operating system, and reported that Red Hat Linux version 7.1 (released April 2001) contained over 30 million physical SLOC. He also determined that, had it been developed by conventional proprietary means, it would have required about 8,000 person-years of development effort and would have cost over $1 billion dollars (in year 2000 U.S. dollars). A similar study was later made of Debian GNU/Linux version 2.2 (also known as "Potato"); this version of GNU/Linux was originally released in August 2000. This study found that Debian GNU/Linux 2.2 included over 55,000,000 SLOC, and if developed in a conventional proprietary way would have required 14,005 person-years and cost $1.9 billion USD to develop.

[Top]

SLOC and relation to security faults

"The central enemy of reliability is complexity" Greer et al.

A number of experts have claimed a relationship between the number of lines of code in a program and the number of bugs that it contains. This relationship is not simple since the number of errors per line of code varies greatly according to the language used, the the type of quality assurance processes and level of testing, but it does appear to exist. More importantly, the number of bugs in a program has been directly related to the number of security faults that are likely to be found in the program.

This has had a number of important implications for system security and these can be seen reflected in operating system design. Firstly, more complex systems are likely to be more insecure simply due to the greater number of lines of code needed to develop them. For this reason security focused systems such as OpenBSD grow much more slowly than other systems such as Windows and Linux. A second idea, taken up in both OpenBSD and many Linux variants is that by separating code into different sections which run with different security environments (with or without special privilages, for example) and ensuring that the most security critical segments are small and carefully





  View Live Article   This article is from Wikipedia. All text is available under the terms of the GNU Free Documentation License