The beginning of the way
Today I would like to talk about such an interesting topic as the “
Java Collections Framework ” or, in simple terms, about collections. Most of the code's work is processing data in one form or another. Get a list of users, get a list of addresses, etc. Somehow sort them, perform a search, compare them. This is why knowledge of collections is considered a core skill. That's why I want to talk about it. In addition, one of the most common questions in Java developer interviews is collections. For example, "draw a hierarchy of collections." The online compiler will help us on our way. For example, you can use " Tutorialspoint
Online Java Compiler " or
Repl.it. The path to getting to know any data structures begins with ordinary variables (Variables). On the Oracle website, various topics are represented as "paths" or Trails. So, the path to getting to know Java is called “
Trail: Learning the Java Language: Table of Contents ”. And the basics of the language (i.e. Language Basics) start with Variables. Therefore, let's write a simple code:
public static void main(String[] args) {
String user = "Max";
System.out.println("Hello, " + user);
}
It is good in everything, except that we understand that this code is good and beautiful only for one variable. What to do if there are several of them? Arrays were invented to store data of one type. In the same Trail from Oracle there is a separate section dedicated to arrays. This section is called: "
Arrays ". Working with arrays is also quite simple:
import java.util.Arrays;
class Main {
public static void main(String[] args) {
String[] users = new String[2];
users[0] = "Max";
users[1] = "John";
System.out.println("Hello, " + Arrays.toString(users));
}
}
Arrays solve the problem of storing multiple values in one place. But it imposes a limitation: the array size is constant. If, as in the example, we said that size = 2, then it is equal to two. That's all. If we want a larger array, we need to create a new instance. Also, finding an element is also a tricky thing for an array. There is a method
Arrays.binarySearch
, but this search only works on a sorted array (for an unsorted array, the result is undefined or simply unpredictable). That is, the search will oblige us to sort each time. Deleting also just clears the value. Therefore, we still don’t know how much data is actually in the array, we only know how many cells are in the array. To refresh your knowledge about arrays:
And as a consequence of the development of the Java language, the Java Collections Framework appeared in JDK 1.2, which we will talk about today.
Collection
Start costing from the very beginning. Why Collections? The term itself comes from things like "Type Theory" and "Abstract Data Types". But if you don’t look at any high matters, then when we have several things, we can call them a “collection of things.” Those who collect items. In general, the word collect itself comes from Lat. collectio "gathering, collecting." That is, a collection is a collection of something, a container for some elements. So we have a collection of elements. What we might want to do with it:
As you can see, we may want quite logical things. We also understand that we may want to do something with multiple collections:
Accordingly, the Java developers wrote the java.util.Collection interface to describe this general behavior for all collections . The Collection interface is where all collections originate. Collection is an idea, it is an idea of how all collections should behave. Therefore, the term "Collection" is expressed as an interface. Naturally, an interface needs implementations. The interface
java.util.Collection
has an abstract class
AbstractCollection
, that is, some “abstract collection”, which represents the skeleton for other implementations (as written in the JavaDoc above the class
java.util.AbstractCollection
). Speaking about collections, let's go back and remember that we want to iterate over them. This means that we want to iterate through the elements one by one. This is a very important concept. Therefore, the interface
Collection
is inherited from
Iterable
. This is very important because... firstly, everything Iterable must be able to return an Iterator based on its contents. And secondly, everything that Iterable can be used in loops
for-each-loop
. And it is with the help of an iterator
AbstractCollection
that methods such as
contains
,
toArray
, are implemented
remove
. And the path to understanding collections begins with one of the most common data structures - a list, i.e.
List
.
Lists
So, lists occupy an important place in the hierarchy of collections:
As we can see, lists implement the
java.util.List interface . Lists express that we have a collection of elements that are arranged in some sequence one after another. Each element has an index (like in an array). Typically, a list allows you to have elements with the same value. As we said above,
List
it knows about the index of the element. This allows you to get (
get
) an element by index or set a value for a specific index (
set
). Collection methods
add
,
addAll
,
remove
allow you to specify the index from which to execute them. Additionally, y
List
has its own version of an iterator called
ListIterator
. This iterator knows about the index of the element, so it can iterate not only forward, but also backward. It can even be created from a specific place in the collection. Among all the implementations, there are two most commonly used:
ArrayList
and
LinkedList
. First,
ArrayList
it is a list (
List
) based on an array (
Array
). This allows you to achieve "Random Access"
to elements. Random access is the ability to immediately retrieve an element by index, rather than iterate through all the elements until we find the element with the desired index. It is the array as a basis that allows this to be achieved. On the contrary,
LinkedList
it is a Linked List. Each entry in a linked list is represented in the form
Entry
, which stores the data itself, as well as a link to the next (next) and previous (previous)
Entry
. Thus
LinkedList
implements "Sequential Access
" . It is clear that to find the 5th element we will have to go from the first element to the last, because we don't have direct access to the fifth element. We can only access it from the 4th element. The difference in their concept is given below:
In work, as you understand, there is also a difference. For example, adding elements. The
LinkedList
elements are simply connected like links in a chain. But
ArrayList
it stores elements in an array. And an array, as we know, cannot change its size. How does it work then
ArrayList
? And it works very simply. When the space in the array runs out, it increases by 1.5 times. Here is the zoom code:
int newCapacity = oldCapacity + (oldCapacity >> 1);
Another difference in operation is any offset of elements. For example, when adding or removing elements to the middle. To remove from
LinkedList
an element, simply remove references to this element. In the case of ,
ArrayList
we are forced to shift the elements each time using the
System.arraycopy
. Thus, the more elements, the more actions will have to be performed. A more detailed description can be found in these articles:
Having examined ArrayList, one cannot help but remember its “predecessor”, the
java.util.Vector class . It differs
Vector
in
ArrayList
that the methods for working with the collection (adding, deleting, etc.) are synchronized. That is, if one thread (
Thread
) adds elements, then other threads will wait until the first thread finishes its work. Since thread safety is often not required, it is recommended to use the class in such cases
ArrayList
, as explicitly stated in the JavaDoc for the class
Vector
. In addition,
Vector
it increases its size not by 1.5 times,
ArrayList
but by 2 times. Otherwise, the behavior is the same -
Vector
the storage of elements in the form of an array is hidden and adding/removing elements has the same consequences as in
ArrayList
. In fact,
Vector
we remembered this for a reason. If we look in the Javadoc, we will see in "Direct Known Subclasses" a structure such as
java.util.Stack . The stack is an interesting structure that is a LIFO
last-in-first-out
(last in, first out) structure. Stack translated from English is a stack (like a stack of books, for example). The stack implements additional methods:
peek
(look, look),
pop
(push),
push
(push). The method
peek
is translated as look (for example,
peek inside the bag is translated as “
look inside the bag ”, and
peek through the keyhole is translated as “
peek through the keyhole ”). This method allows you to look at the “top” of the stack, i.e. get the last element without removing (i.e. without removing) it from the stack. The method
push
pushes (adds) a new element onto the stack and returns it, and the element method
pop
pushes (removes) and returns the removed one. In all three cases (i.e. peek, pop and push), we only work with the last element (i.e. the “top of the stack”). This is the main feature of the stack structure. By the way, there is an interesting task to understand stacks, described in the book “A Programmer’s Career” (Cracking Coding Interview). There is an interesting task where using the “stack” structure (LIFO) you need to implement the “queue” structure (FIFO). It should look like this:
Analysis of this task can be found here: "
Implement A Queue Using Stacks - The Queue ADT ("Implement Queue Using Stacks" on LeetCode) ". So we smoothly move on to a new data structure - a queue.
Queue
A Queue is a structure familiar to us from life. Queues to shops, to doctors. Whoever came first (First In) will be the first to leave the queue (First Out). In Java, a queue is represented by the
java.util.Queue interface . According to the queue's Javadoc, the queue adds the following methods:
As you can see, there are order methods (failure to execute them is fraught with an exception) and there are request methods (the inability to execute them does not lead to errors). It is also possible to get the element without removing it (peek or element). The queue interface also has a useful successor -
Deque . This is the so-called "two-way queue". That is, such a queue allows you to use this structure both from the beginning and from the end. The documentation says that "Deques can also be used as LIFO (Last-In-First-Out) stacks. This interface should be used in preference to the legacy Stack class.", that is, it is recommended to use Deque implementations instead of Stack. The Javadoc shows what methods the Deque interface describes:
Let's see what implementations there are. And we’ll see an interesting fact - LinkedList has gotten into the queue camp) That is, LinkedList implements both
List
, and
Deque
. But there are also “queues only”, for example
PriorityQueue
. She is not often remembered, but in vain. Firstly, you cannot use "non-comparable objects" in this queue, i.e. either Comparator must be specified or all objects must be comparable. Secondly, "this implementation provides O(log(n)) time for the enqueuing and dequeuing methods". Logarithmic complexity is there for a reason. Implemented PriorityQueue based on the heap. The Javadoc says: "Priority queue represented as a balanced binary heap." The storage itself for this is a regular array. Which grows when necessary. When the heap is small, it increases 2 times. And then by 50%. Comment from the code: "Double size if small; else grow by 50%". Priority queue and Binary Heap are a separate topic. So for more information:
As an implementation,
java.util.Deque
you can use the
java.util.ArrayDeque class . That is, lists can be implemented using a linked list and an array, and queues can also be implemented using an array or using a linked list.
Queue
The and interfaces
Deque
have descendants representing the "blocking queue":
BlockingQueue
and
BlockingDeque
. Here is the interface change compared to regular queues:
Let's look at some examples of blocking queues. But they are interesting. For example, BlockingQueue is implemented by:
PriorityBlockingQueue ,
SynchronousQueue , ArrayBlockingQueue,
DelayQueue ,
LinkedBlockingQueue . But
BlockingDeque
they implement everything from the standard Collection Frameworks
LinkedBlockingDeque
. Each queue is the topic of a separate review. And within the framework of this review, we will depict the class hierarchy not only with
List
, but also with
Queue
:
As we can see from the diagram, the interfaces and classes of the Java Collections Framework are highly intertwined. Let's add another branch of the hierarchy -
Set
.
Set
Set
— translated as “set.” It differs from a queue and a list
Set
in its greater abstraction over the storage of elements.
Set
- like a bag of objects, where it is unknown how the objects are located and in what order they lay. In Java, such a set is represented by the
java.util.Set interface . As the documentation says,
Set
this is a "collection that contains no duplicate elements". Interestingly, the interface itself
Set
does not add new methods to the interface
Collection
, but only clarifies the requirements (about what should not contain duplicates). In addition, from the previous description it follows that you cannot simply
Set
get an element from it. Iterator is used to get elements.
Set
has several more interfaces associated with it. The first one is
SortedSet
. As the name suggests,
SortedSet
it indicates that such a set is sorted, and therefore the elements implement the interface
Comparable
or are specified
Comparator
. In addition,
SortedSet
it offers several interesting methods:
In addition, there are methods
first
(smallest element by value) and
last
(largest element by value). There
SortedSet
is an heir -
NavigableSet
. The purpose of this interface is to describe the navigation methods needed to more accurately identify appropriate elements. An interesting thing is
NavigableSet
that it adds to the usual iterator (which goes from smallest to largest) an iterator for the reverse order -
descendingIterator
. In addition,
NavigableSet
it allows you to use the method
descendingSet
to obtain a view of yourself (View), in which the elements are in reverse order. This is called
View
because through the resulting element you can change the elements of the original one
Set
. That is, in essence, it is a representation of the original data in a different way, and not a copy of it. Interestingly,
NavigableSet
, like
Queue
, can handle
pollFirst
(minimal) and
pollLast
(maximal) elements. That is, it gets this element and removes it from the set. What kind of implementations are there? Firstly, the most famous implementation is based on a hash code -
HashSet . Another equally well-known implementation is based on a red-black tree -
TreeSet . Let's complete our diagram:
Within the collections, it remains to sort out the hierarchy - the hermits. Which at first glance stands aside -
java.util.Map
.
Maps
Maps are a data structure in which data is stored by key. For example, the key could be an ID or city code. And it is by this key that the data will be searched. It is interesting that the cards are displayed separately. According to the developers (see "
Java Collections API Design FAQ "), key-value mapping is not a collection. And maps can more quickly be thought of as a collection of keys, a collection of values, a collection of key-value pairs. This is such an interesting animal. What methods do cards provide? Let's look at the Java API interface
java.util.Map . Because Since maps are not collections (they do not inherit from Collections), they do not contain a
contains
. And this is logical. A map consists of keys and values. Which of these should the method check
contains
and how not to get confused? Therefore, the interface
Map
has two different versions:
containsKey
(contains a key) and
containsValue
(contains a value). Using it
keySet
allows you to get a set of keys (the same one
Set
). And using the method
values
we can get a collection of values in the map. The keys in the map are unique, which is emphasized by the data structure
Set
. Values can be repeated, which is emphasized by the Collection data structure. In addition, using the method
entrySet
we can obtain a set of key-value pairs. You can read more about what card implementations there are in the most detailed analyzes:
I would also like to see what is
HashMap
very similar to
HashSet
, and
TreeMap
to
TreeSet
. They even have similar interfaces:
NavigableSet
and
NavigableMap
,
SortedSet
and
SortedMap
. So our final map will look like this:
We can end with the interesting fact that the collection
Set
internally uses
Map
, where the added values are keys, and the value is the same everywhere. This is interesting because it
Map
is not a collection and returns
Set
, which is a collection but is in fact implemented as
Map
. A little surreal, but that’s how it turned out)
Conclusion
The good news is that this ends this review. The bad news is that this is a very review article. Each implementation of each of the collections deserves a separate article, and also for each algorithm hidden from our eyes. But the purpose of this review is to remember what they are and what the connections are between the interfaces. I hope that after thoughtful reading you will be able to draw a diagram of the collections from memory.
Well, as usual, some links:
#Viacheslav
GO TO FULL VERSION