| |||||||||
In computer science, a linked list is one of the simplest data structures used in computer programming. It consists of a sequence of nodes, each containing arbitrary data fields and one or two references ("links") pointing to the next and/or previous nodes. Linked lists permit insertion and removal of nodes at any point in the list in constant time, but do not allow random access.
The simplest kind of linked list is a singly linked list, which has one link per node. This link points to the next node in the list, or to a null value or empty list if it is the last node.
To specify a singly-linked list (e.g. when handling it over to a procedure) it suffices to specify the address of its first element.
A more sophisticated kind of linked list is a doubly linked list. Each node has two links, one to the previous node and one to the next.
In some very low level languages, Xor-linking offers a way to implement doubly-linked lists using a single word for both links, although the use of this technique is usually discouraged.
In a circularly linked list, the first and last nodes are linked together. This can be done for both singly and doubly linked lists. To traverse a circular linked list, you begin at any node and follow the list in either direction until you return to the original node.
Singly linked lists are often handled by giving the address of the last element — because that element also provides quick access to the first one, whereas the converse is not true.
Linked lists sometimes have a special dummy or sentinel node at the beginning and/or at the end of the list, which is not used to store data. Its purpose is to simplify or speed up some operations, by ensuring that every data node always has a previous and/or next node, and that every list (even one that contains no data elements) always has a "first" and "last" node.
Linked lists are used as a building block for many other data structures, such as stacks, queues and their variations.
The "data" field of a node can be another linked list. By this device, one can construct many linked data structures with lists; this practice originated in the Lisp programming language, where linked lists are a primary (though by no means the only) data structure, and is now a common feature of the functional programming style.
Sometimes, linked lists are used to implement associative arrays, and are in this context called association lists. There is very little good to be said about this use of linked lists; they are easily outperformed by other data structures such as self-balancing binary search trees even on small data sets (see the discussion in associative array). However, sometimes a linked list is dynamically created out of a subset of nodes in such a tree, and used to more efficiently traverse that set.
As for most choices in computer programming and design, no method is well suited to all circumstances. A linked list data structure might work well in one case, and cause problems in another. This is a list of some of the common tradeoffs involving linked list structures. In general, if you have a dynamic collection, where elements are frequently being added and deleted, and the location of new elements added to the list is significant, then benefits of a linked list increase.
Linked lists have several advantages over arrays. For one thing, a linked list allows a new element to be inserted or deleted in any position, with a constant number of operations (changing a couple of pointers). Linked lists can also save memory in applications that need one or more lists of unpredictable size, since the total space used will be proportional to the number of elements actually present in all lists (as opposed to the number of lists times the maximum size of each list).
Further memory savings can be achieved, in certain cases, by sharing the same tail elements among two or more lists. In this way, one can add new elements to the front of the list while keeping a reference to both the new and the old versions — which provides the simplest example of a persistent data structure.
On the other hand, compared to an array, the chief disadvantage of a singly linked list is that it only allows sequential access to elements, and only in one direction (from front to back). In particular, when removing or inserting an element in the middle of the list, one needs to know the address of the preceding element.
Another disadvantage of linked lists is the extra storage needed for the address fields, which often makes them impractical for lists of small data items such as characters or boolean values. Even more significant is the time cost of allocating memory separately for each new element (and, in some systems, of locating and recycling elements that cannot be reached; see garbage collection). There are a number of techniques to reduce this overhead, such as using a free list or memory pool, storing the data itself inside the nodes, and CDR coding.
Double-linked lists require more space per node (unless one uses xor-linking), and their elementary operations are more expensive; but they are often easier to manipulate because they allow sequential access to the list in both directions. In particular, one can insert or delete a node in a constant number of operations given only that node's address. Some algorithms, such as quicksort, require access in both directions. On the other hand, they do not allow tail-sharing, and cannot be used as persistent data structures.
Circular linked lists are most useful for describing naturally circular structures, and have the advantage of regular structure and being able to traverse the list starting at any point. They also allow quick access to the first and last records through a single pointer (the address of the last element). Their main disadvantage is the complexity of iteration, which has subtle special cases.
Sentinel nodes may simplify certain list operations, by ensuring that the next and/or previous nodes exist for every element. However sentinel nodes use up extra space (especially in applications that use many short lists), and they may complicate other operations.
When manipulating linked lists in-place, care must be taken to not use values that you have invalidated in previous assignments. This makes algorithms for inserting or deleting linked list nodes somewhat subtle. This section gives pseudocode for adding or removing nodes from singly, doubly, and circularly linked lists in-place. Throughout we will use null to refer to an end-of-list marker or sentinel, which may be implemented in a number of ways.
Our node data structure will have two fields. We also keep a variable head which always points to the first node in the list, or is null for an empty list.
The following is wikicode, a proposed pseudocode for BambooWeb articles.
Traversal of a singly-linked list is easy:
The following code inserts a node after an existing node in a singly linked list. Inserting a node before an existing one cannot be done; instead, you have to locate it while keeping track of the previous node.
Inserting at the beginning of the list requires a separate function. This requires updating head.
Similarly, we have functions for removing the node after a given node, and for removing a node from the beginning of the list. To find and remove a particular node, one must again keep track of the previous element.
Notice that removeBeginning() sets head to null when removing the last node in the list.
With doubly-linked lists there are even more pointers to update, but also less information is needed, since we can use backwards pointers to observe preceding elements in the list. This enables new operations, and eliminates special-case functions. We will add a prev field to our nodes, pointing to the previous element, and a tail variable which always points to the last node in the list. Both head and tail are null for an empty list.
Iterating through a doubly linked list can be done in either direction. In fact, direction can change many times, if desired.
Forwards
Backwards
These symmetric functions add a node either after or before a given node:
We also need a function to insert a node at the beginning of a possibly-empty list:
A symmetric function inserts at the end:
Removing a node is easier, only requiring care with head and tail:
One subtle consequence of this procedure is that deleting the last element of a list sets both head and tail to null.
Circularly-linked lists can be either singly or doubly linked. For both, their circular versions, which link all the nodes into a continuous circle without using null, benefit from the ability to traverse the full list beginning at any given node. This often allows us to avoid storing head and tail, although if the list may be empty we need a special representation for the empty list, such as a head variable which points to some node in the list or is null if it's empty; we use such a head here. This representation significantly simplifies adding and removing nodes with a non-empty list, but empty lists are then a special case.
Assuming that someNode is some node in a non-empty list, this code iterates through that list starting with someNode:
Forwards
Backwards
Notice the postponing of the test to the end of the loop. This is important for the case where the list contains only the single node someNode.
This simple function inserts a node into a doubly-linked circularly linked list after a given element:
Inserting an element in a possibly empty list requires a special function:
Finally, removing a node must deal with the case where the list empties:
Languages that don't support any type of reference can still create links by replacing pointers with array indices. The approach is to keep an array of records, where each record has integer fields indicating the index of the next and (possibly) previous node in the array. Not all nodes in the array need be used. If even records are not supported, parallel arrays can often be used instead.
As an example, consider the following type structure in Visual Basic:
By creating an array of these structures, and an integer variable to store the index of the head element, a linked list can be built:
Links between elements are formed by placing the array index of the next (or previous) cell into the Next or Prev field within a given element. For example:
| Index | Next | Prev | Name | Balance |
|---|---|---|---|---|
| 0 | 1 | 4 | Jones, John | 123.45 |
| 1 | -1 | 0 | Smith, Joseph | 234.56 |
| 2 | 4 | -1 | Adams, Adam | 0.00 |
| 3 | Ignore, Ignatius | 999.99 | ||
| 4 | 0 | 2 | Another, Anita | 876.54 |
| 5 | ||||
| 6 | ||||
| 7 |
In the above example, ListHead would be set to 2, the location of the first entry in the list. Notice that entry 3 and 5 through 7 are not part of the list. These cells are available for any additions to the list. By creating a ListFree integer variable, a serialized for storage on disk or transfer over a network.
This approach has one main disadvantage, however: it creates and manages a private memory space for its nodes. Not only does this increase complexity of the implementation, but growing it may be difficult or impossible, because it is large, whereas finding space for a new linked list node in a large, general memory pool is easier. This slow growth also affects algorithmic performance, as it causes a few insert operations to unexpectedly take linear (O(n)) instead of constant time (although it's still amortized constant). Using a general memory pool also leaves more memory for other data if the list is smaller than expected or if many nodes are freed.
Many programming languages, such as Lisp and Scheme have singly linked lists built in. In many functional languages, these lists are constructed from nodes, each called a cons or cons cell. The cons has two fields: the car, a reference to the data for that node, and the cdr, a reference to the next node. Although cons cells can be used to build other data structures, this is their primary purpose.
In other languages, linked list are typically built using references together with records. Here is a complete example in C:
When constructing a linked list, one is faced with the choice of whether to store the data of the list directly in the linked list nodes, called internal storage, or to merely store a reference to the data, called external storage. Internal storage has the advantage of making access to the data more efficient, requiring less storage overall, having better locality of reference, and simplifying memory management for the list (its data is allocated and deallocated at the same time as the list nodes).
External storage, on the other hand, has the advantage of being more generic, in that the same data structure and machine code can be used for a linked list no matter what the size of the data is. It also makes it easy to place the same data in multiple linked lists. Note, however, that even with internal storage the same data can be placed in multiple lists, either by including multiple next references in the node data structure, or by having other linked lists store references to the nodes of the linked list containing the data.
A common misconception is that every list using internal storage requires a new set of functions to operate on the linked list, consuming more effort and code space than an approach using external storage. This is true with a naive approach, but as long as the next pointers are located in the same place in each record, it is possible to create a generic implementation of linked lists with internal storage that only needs to be supplied for each type of list a way of creating and destroying nodes.
Suppose you wanted to create a linked list of families and their members. Using internal storage, the structure might look like the following:
To print a complete list of families and their members using internal storage, we could write:
Using external storage, we would create the following structures:
To print a complete list of families and their members using external storage, we could write:
Notice that when using external storage, an extra step is needed to extract the record from the node and cast it into the proper data type.
The skip list is a linked list augmented with layers of pointers for quickly jumping over large numbers of elements, and then descending to the next layer. This process continues down to the bottom layer, which is the actual list.
A binary tree can be seen as a type of linked list where the elements are themselves linked lists of the same nature. The result is that each node may include a reference to the head node of one or two other linked lists, which, together with their contents, form the subtrees below that node.