Developer Interview: Debriefing Database Questions

Hi all! All of us here are fighting for the same goal - to become Java Developers . Perhaps the most important stage on the path of becoming is a technical interview. As a rule, the interviewer goes through the main topics, asking a couple of questions. In this article, we will talk about one of these key topics - databases . Let's consider the most common questions and try to answer them without deep immersion in the material, because in this case, the volume of the book will not be enough for us! So let's go. Developer interview: analysis of questions on databases - 1

Developer interview: analysis of questions on databases - 1

1. What are databases? What types are they divided into?

What is meant by DBMS?

Developer Interview: Debriefing Database Questions - 2

A database (DB) is an organized structure designed to store, modify and process interconnected information, mostly of large volumes. In other words, a database is a structured data store. For example, the phone book.

Database types

A relational database is a set of data with predefined relationships between them. The data is stored as a set of tables consisting of columns and rows. Tables store information about the objects represented in the database. Each column of the table stores a certain type of data, each cell stores an attribute value.
Non-relational systems (NoSQL) are systems designed for specific data models with flexible schemas. In other words, these are databases that store data not in the form of tabular schemes, rows and columns, but in other formats.

You can read more about non-relational databases in this article: NoSQL Developer's Guide . A database management system (DBMS) is a set of software with which a user can create databases (DB) and perform various operations on them: add, update, delete, select, etc. The DBMS guarantees the safety, integrity, data storage security and allows you to issue access to the database administration. An example is MySql, which is a DBMS that provides access to a relational database, or MongoDB for a non-relational one.

2. What is normalization? Normalized form? How many forms of normalization are there? Name the first three.

Normalization is the process of organizing, structuring the data in a database, which makes the database more flexible by eliminating redundancy and dependency inconsistencies. Normal form is a property of a table, considered in the context of normalization, which characterizes the table in terms of simplicity and correctness of structure construction. Normal form is defined as the set of requirements that a table must satisfy. There are six normal forms in total, but no more than the first three are used in practice:

First normal form:
- All attributes are simple (that is, atomic and indivisible);
- All data is scalar (that is, positive);
- There are no duplicate rows (for this, a primary key is created for each row).
Second normal form:
- The conditions of the first normal form are met;
- Each non-key attribute refers to a primary key.
Third normal form:
- The conditions of the second normal group are met;
- Non-key fields are independent of other non-key fields: they can only be associated with a primary key.

3. Denormalization

Denormalization- intentional reduction or violation of database normalization forms, usually - to speed up reading from the database by adding redundant data. In general, this is the reverse process of normalization. This happens because the theory of normal forms is not always applicable in practice. For example, non-atomic values are not always “evil”: sometimes even vice versa. In some cases, additional joins are needed when executing queries, especially when processing a large array of information. Ultimately, this can improve performance. For databases intended for analytics, denormalization is often performed to speed up query execution. For example, you will often sample some data for reports where non-key columns will be related.

4. Indexes

An index is a sorted set of values associated with a table or view with a specific column that speeds up data retrieval. That is, it is such an index: like the alphabet in the phone book, which helps us when searching by last name. If you use this feature correctly, you can greatly improve performance when working with large databases. And you can lower it a lot. To speed up the lookup, these keys are stored as a B-tree structure, which is searched. As a rule, indexes should be entered on the fields that are most frequently searched. You should think about creating an index no earlier than when you have at least 10 thousand records. Otherwise, you will not see a noticeable result, because premature optimization is EVIL. And how can an index hit system performance, you ask? When new data is inserted or old data is deleted, the B-tree structure will be recalculated. Actually, the more data and indexes, the more trees need to be recalculated. Imagine the situation: you have about 20,000 records and 7 indexes on this table. That is, when inserting data, you need to recalculate 7 trees, each of which has 20,000 records. Strictly speaking, it is not recommended to use indexes for tables where data will be frequently added/deleted. Finally, I note that indexes for columns in which the value often occurs nullwill not be as effective, so you should not add them to such columns.

What is the difference between clustered and non-clustered indexes in SQL?

Clustered:

Provides a physical order for the selected field;
If a table has a clustered index, it is called clustered;
You need at most one index per table;
In MySQL, a clustered index is not explicitly set by the user because if you don't define a PRIMARY KEY for your table, MySQL finds the first index UNIQUEwhere all key columns are NOT NULL, and InnoDB uses it as the clustered index.

Nonclustered:

Up to 999 non-clustered indexes are possible on a single table;
Contains a pointer to rows with actual data in the table;
Does not provide physical order;
For non-clustered indexes, there are separate tables with sorted data, namely, one table for one column on which the index is, therefore, when requesting data that is not part of this field, the query will first be performed on the field in this table, and only then - additional query for a row in the original table.

Creating a non-clustered index:

CREATE INDEX index_name ON table_name(column_name)

6. What is a composite index?

Composite index - built with sending to several columns at the same time. In other words, it is a complex index consisting of several columns. Such indexes are used when more than one column appears in one query. Create a composite index:

CREATE INDEX index_name ON table_name(first_column_name, second_column_name, third_column_name)

Typically, these indexes are used when the data in multiple columns is logically related.

7. What is a covering index? Unique index?

A covering index is an index that is sufficient to answer a query without accessing the table itself. By this index, you can get the entire row of data, but in fact this is simply not necessary. By not having to go directly to the original table, and answering using just the index, covering indexes is slightly faster to use. At the same time, do not forget that the more columns, the more cumbersome and slow the index itself becomes. So don't overuse it. Above, we talked about clustered and non-clustered indexes, which can be unique.. This means that no two fields have the same value for the index key. Otherwise, the index will not be unique, because several rows can contain the same value. An example of creating a unique non-clustered index:

CREATE UNIQUE INDEX index_name ON table_name(column_name)

8. What is a primary key

Primary key is a field in a table to identify each row in a database table. There can only be one such field in a table, and all values must be unique. Didn't you remember anything? Developer Interview: Debriefing Database Questions - 3

Developer Interview: Debriefing Database Questions - 3

After all, the primary key is nothing more than a unique, clustered index. As a rule, primary keys are created when a table is created:

CREATE TABLE table_name(
column_name int PRIMARY KEY,
…..)

A constraint will be automatically added to this column - NOT NULL. You can also set a key for an already created table:

ALTER TABLE table_name ADD PRIMARY KEY (column_name);

If the primary key is added in the manner described above, then the values of the fields specified as the primary key ( column_name) are checked for the absence of nulls (a constraint - will also be added NOT NULL).

What is a foreign key?

Foreign key (Foreign key) is a property created in order to provide a relationship between tables. Typically, a foreign key is set on columns in a sub table and points to one of the columns from the main table. Can be set as when creating a table:

CREATE TABLE table_name{
column_name int,
…..
FOREIGN KEY(column_name) REFERENCES another_table_name(another_table_column_name) }

And after creating the table:

ALTER TABLE table_name
ADD FOREIGN KEY(column_name) REFERENCES another_table_name(another_table_column_name));

You can set the behavior of a foreign key when manipulating the field it refers to. Manipulations can be of the form ON DELETEand ON UPDATE. Possible behaviors:

CASCADE- with this property, rows from the dependent table will be automatically deleted or changed when the related rows in the main table are deleted or changed;
SET NULLNULL- with this property, when a related row is deleted or updated from the main table, the value for the foreign key column will be set ;
NO ACTION- rejects attempts to delete or change rows in the main table if there are related rows in the dependent table;
RESTRICT- equivalent to NO ACTION;

SET DEFAULT

Usage example:

CREATE TABLE table_name{
column_name int,
…..
FOREIGN KEY(column_name) REFERENCES another_table_name(another_table_column_name) ON UPDATE CASCADE ON DELETE CASCADE }

ON DELETEIf the behavior for and is not explicitly set ON UPDATE, then the behavior will be set RESTRICT.

10. Types of connections between tables (Join)

Communication between tables is provided on the basis of common data (fields). This is done with an operator JOIN, an operation that matches rows from one table with rows in another. The mapping is done so that the columns of both tables are side by side, although they can be obtained from separate tables. And if we have common fields for three tables, we can display their data as one common table. However, keep in mind that the fewer tables are joined, the faster the query will run. So, types JOIN:

INNER JOIN- a connection that shows only the data from the first table, which corresponds to some data from the second table. The rest go down.
LEFT JOIN- a connection that shows all the data from the first table and the corresponding data from the second, if any. If there is no corresponding data, the data fields from the second table will be empty.
RIGHT JOIN- a join that shows all the data from the second table and the corresponding data from the first, if any. If there is no corresponding data, the data fields from the first table will be empty.
FULL JOIN- a connection that shows all the data from the first and second tables. If there is no related data in another table, the fields for this data will be empty.
CROSS JOIN- cross join, in which each row of the first table is connected with each row of the second table (each with each). That is, if two tables have 3 rows, after this join we will get a result of 9 rows.

Example Join(inner):

SELECT *
FROM first_table
INNER JOIN second_table ON first_table.some_column = second_table.some_column

11. What is the ACID property in a database?

A - Atomicity , ensures that no transaction is partially committed in the system. Either all of its sub-operations are executed, or none of them. For example, transferring money in a bank to another account is two operations:

Transfer money to a bank account.
Transfer money from a bank account to a specific account.

But anything can happen. For example, they will go to the bank, and then some error will occur and the second operation will not be performed. Or vice versa: only the second operation will be executed. Therefore, these actions are carried out within a single transaction, and the result is either all or nothing. C - Consistency : Each successful transaction always commits only resolvable results. This ensures that all restrictions are met (for example, NOT NULL), otherwise the transaction will be rolled back. I - isolation: during the execution of a transaction, parallel transactions should not affect its result. This gives us the ability to hide non-final data states from everyone. Actually, therefore, unsuccessful transactions cannot break anything. A little later, we will get acquainted with the levels of isolation of transactions. D - durability : if the transaction is completed, then you can be sure that the changes made by it will not be canceled due to any failure.

12. Transaction isolation levels

Each isolation level allows/prohibits certain actions (features):

phantom read - within the same transaction, the same data request gives different results, which occurs due to the addition of data by another (parallel) transaction.
non-repeating reading - within the same transaction, the same data request gives different results, which occurs due to data being changed or deleted by another (parallel) transaction.
"dirty" reading - reading data added or modified by a transaction that will not be rolled back later;
lost update - if one block of data is changed by different transactions at the same time, all changes are lost, except for the last one (similar to a “race condition” in multithreading).

For convenience, consider the isolation levels and their capabilities in the table:

Insulation levels	phantom reading	Non-repeating read	"dirty" reading	lost update
SERIALIZABLE	+	+	+	+
REPEATABLE_READ	-	+	+	+
READ_COMMITTED	-	-	+	+
READ_UNCOMMITTED	-	-	-	+

13. What is SQL injection?

SQL injection is one of the ways to hack a site, the essence of which is the introduction of some SQL code into the data through GETqueries POSTor Cookies. If a website performs such injections, it is possible to access the database and hack the application. For example, we know the name of some variable. Let's say column_namewith type boolean. If the system is susceptible to injections, we can add OR column_name=trueand then write everything we need from the database. ORwill create an OR condition, and our expression after it will always be true, which will skip us further. An attack on a site like SQL injection is possible due to incorrect processing of incoming data used in SQL queries. When connecting to a database using JDBC , you use variousStatements. To increase security, it should be used PreparedStatementinstead of the usual one Statement, since when using Statementthe query string and values are simply added together, making possible injections. In turn, there PreparedStatementis a specific request template, and data is inserted into it with quotes reflected. As a result, SQL injections will be perceived only as a string representation of some field. To protect against SQL injection, you can use regular expression-based validation (more on regular expressions can be found in this article ). Developer Interview: Debriefing Database Questions - 9

Developer Interview: Debriefing Database Questions - 9

Another option is to set a limit on the number of characters in input parameters: for example, if you should receive a number no more than 9999, a limit of four input characters will do. It will reduce the risk of being hacked using SQL injection.

Developer Interview: Debriefing Database Questions - 10

You can learn more about security in Java from the article “Security in Java: best practices” .

14. What are stored procedures? Stored functions? Trigger?

Stored procedures in SQL are an entity in the database, which is a set of SQL statements that is compiled once and stored on the server. In a word, this is an analogue of methods in Java. Stored procedures can perform actions on data as regular queries, as well as some actions that are not available for regular queries. A procedure is an SQL entity that is created once and then called with arguments. The advantage of this approach is that these instructions can be reused more than once. Stored procedures increase performance, enhance programming options, and support data security features. Consider creating a procedure:

CREATE PROCEDURE procedure_name (first_param some_type, second_param some_type..)
 begin
……...
 end

Procedure call:

CALL procedure_name (first_param, second_param…..);

A stored function is a type of stored procedure. The difference between a function is that it always returns only a single value, while a procedure always returns a set of values. Stored procedures cannot be mixed with regular SQL, while a stored function can be mixed - and this is its advantage. On the other hand, stored functions have many more limitations than procedures. Create a stored function:

CREATE FUNCTION function_name (first_param, second_param…..)
RETURNS some_type
 begin
……...
RETURN some_value;
end

Calling a stored function:

SELECT function_name(first_param, second_param…..);

A trigger is another type of stored procedure that is not called directly by the user, but is activated when data is modified. That is, this procedure is activated when certain conditions are met, such as INSERTor DELETE, or UPDATEdata in a certain column of this table. When a trigger fires is defined using the keywords BEFORE(trigger fires before the associated event) or AFTER(after the event).

CREATE TRIGGER trigger_name
ON table_name
AFTER INSERT
 begin
……...
 end

15. Practice

Be that as it may, the most common SQL interview question will be practice - problem solving. Trying to guess what kind of tasks you will come across is pointless, because everything depends on the sophistication of the imagination of the person opposite. Therefore, the only working option would be to get your hands on SQL queries of varying complexity. sql-ex.ru can serve as a resource for practice on various tasks . Already after the first twenty completed tasks, it will be quite difficult for your interlocutor to scare you with any SQL task. Developer Interview: Debriefing Database Questions - 11

Developer Interview: Debriefing Database Questions - 11

That's all for today: I hope that after reading this article, questions on databases will not cause any difficulties and problems. Thank you for your attention and see you soon!