JavaRush /Java Blog /Random EN /Developer interview: analysis of database questions

Developer interview: analysis of database questions

Published in the Random EN group
Hi all! We are all here working towards one goal - to become Java Developers . Perhaps the most important stage on the path to becoming a professional is the technical interview. As a rule, the interviewer goes through the main topics, asking a couple of questions. In this article, we will talk about one such key topic - databases . Let's look at the most frequently asked questions and try to answer them without diving deeply into the material, because in this case the volume of the book will not be enough for us! So, let's go.Developer interview: analysis of database questions - 1

1. What are databases? What types are they divided into?

What is meant by DBMS?

Developer interview: analysis of database questions - 2A database (DB) is an organized structure designed for storing, changing and processing interrelated information, mainly large volumes. In other words, a database is a structured data storage. For example, a phone book.

Database types

  1. A relational database is a collection of data with predefined relationships between them. Data is stored as a set of tables consisting of columns and rows. Tables store information about the objects represented in the database. Each table column stores a specific data type, and each cell stores an attribute value.
  2. Non-relational systems (NoSQL) are systems designed for specific data models with flexible schemas. In other words, these are databases that store data not in the form of tabular schemes, rows and columns, but in other formats.
You can read more about non-relational databases in this article: A Guide to NoSQL for Developers . A database management system (DBMS) is a set of software with which the user can create databases (DBs) and perform various operations on them: supplement, update, delete, select, etc. The DBMS guarantees the safety, integrity, security of data storage and allows you to grant access to database administration. As an example, MySql is a DBMS that provides access to a relational database or MongoDB for a non-relational one.

2. What is normalization? Normalized form? How many forms of normalization are there? Name the first three.

Normalization is the process of organizing and structuring data in a database, which provides greater database flexibility by eliminating redundancy and inconsistency of dependencies. Normal form is a property of a table, considered in the context of normalization, which characterizes the table in terms of simplicity and correctness of the structure. Normal form is defined as a set of requirements that a table must satisfy. There are six normal forms in total, but in practice no more than the first three are used:
  1. First normal form:
    • All attributes are simple (that is, atomic and indivisible);
    • All data is scalar (that is, positive);
    • There are no duplicate rows (for this, a primary key is created for each row).
  2. Second normal form:
    • The conditions of the first normal form are met;
    • Each non-key attribute references a primary key.
  3. Third normal form:
    • The conditions of the second normal group are met;
    • Nonkey fields are independent of other nonkey fields: they can only be associated with a primary key.

3. Denormalization

Denormalization is the deliberate reduction or violation of forms of database normalization, usually to speed up reading from the database by adding redundant data. In general, this is a process inverse to normalization. This happens because the theory of normal forms is not always applicable in practice. For example, non-atomic values ​​are not always “evil”: sometimes even the opposite. In some cases, additional joins are necessary when executing queries, especially when processing a large amount of information. This can ultimately improve performance. Databases intended for analytics are often denormalized to speed up query execution. For example, you will often sample some data for reports where non-key columns will be related to each other. You intentionally remove the third form of normalization and combine everything in one table for ease of sampling - so that you don’t have to make additional queries to other tables.

4. Indexes

An index is a sorted set of values ​​associated with a table or view with a specific column that speeds up data retrieval. That is, this is a kind of index: like the alphabet in a phone book, which helps us when searching by last name. If used correctly, this feature can greatly improve performance when working with large databases. Or you can lower it a lot. To speed up searching, these keys are stored in a balanced tree structure through which the search is performed. As a rule, indexes need to be entered on the fields that are most often searched. You should think about creating an index no earlier than when you have at least 10 thousand records. Otherwise, you will not see a noticeable result, because premature optimization is EVIL . And how can an index affect system performance, you ask? When new data is inserted or old data is deleted, the balanced tree structure will be recalculated. Actually, the more data and indexes, the more trees need to be counted. Imagine the situation: you have about 20,000 records and 7 indexes on this table. That is, when inserting data, you need to recalculate 7 trees, each with 20,000 records. Strictly speaking, using indexes for tables to which data will be frequently added/deleted is not recommended at all. Finally, I would like to note that indexes for columns in which the value is often found nullwill not be as effective, so it is not worth adding them to such columns.

What is the difference between clustered and non-clustered indexes in SQL?

Clustered:

  • Provides physical order for the selected field;
  • If a table has a clustered index, it is said to be clustered;
  • No more than one index per table is needed;
  • In MySQL, a clustered index is not explicitly specified by the user, because if you do not define a PRIMARY KEY on your table, MySQL finds the first index UNIQUEwhere all key columns are NOT NULL, and InnoDB uses it as the clustered index.

Non-clustered:

  • Up to 999 nonclustered indexes are possible on a single table;
  • Contains a pointer to rows with real data in the table;
  • Does not provide physical order;
  • For non-clustered indexes, there are separate tables with sorted data, namely, one table for one column on which the index is located, therefore, when requesting data that is not part of a given field, the query will first be performed on the field in this table, and only then the additional query against a row in the original table.
Creating a non-clustered index:
CREATE INDEX index_name ON table_name(column_name)

6. What is a composite index?

Composite index - built with sending to several columns at the same time. In other words, it is a complex index consisting of several columns. Such indexes are used when more than one column appears in one query. Creating a composite index:
CREATE INDEX index_name ON table_name(first_column_name, second_column_name, third_column_name)
Typically, these indexes are used when data in multiple columns is logically related.

7. What is a covering index? Unique index?

A covering index is an index that is sufficient to answer a query without accessing the table itself. Using this index, you can get the entire row of data, but in fact this is simply not necessary. Because you don't have to go directly to the source table and can answer using just the index, covering indexes are slightly faster to use. At the same time, do not forget that the more columns, the more cumbersome and slow the index itself becomes. So you shouldn't abuse this. Above we talked about clustered and non-clustered indexes, which can be unique . This means that no two fields have the same value for the index key. Otherwise, the index will not be unique, because several rows may contain the same value. An example of creating a unique non-clustered index:
CREATE UNIQUE INDEX index_name ON table_name(column_name)

8. What is a primary key

Primary key is a field in a table that identifies each row in a database table. There can be only one such field in a table, and all values ​​must be unique. Didn't remind you of anything? Developer interview: analysis of database questions - 3After all, a primary key is nothing more than a unique, clustered index. As a rule, primary keys are created when creating a table:
CREATE TABLE table_name(
column_name int PRIMARY KEY,..)
A restriction will be automatically added to this column - NOT NULL. You can also set a key for an already created table:
ALTER TABLE table_name ADD PRIMARY KEY (column_name);
If a primary key is added in the manner described above, then the values ​​of the fields specified as the primary key ( column_name) are checked to ensure they do not contain null values ​​(a constraint will also be added - NOT NULL).

What is a foreign key?

A foreign key is a property created to provide a relationship between tables. Typically, a foreign key is set on columns in a subtable and points to one of the columns from the main table. Can be specified as when creating a table:
CREATE TABLE table_name{
column_name int,..
FOREIGN KEY(column_name) REFERENCES another_table_name(another_table_column_name) }
So after creating the table:
ALTER TABLE table_name
ADD FOREIGN KEY(column_name) REFERENCES another_table_name(another_table_column_name));
You can set the behavior of a foreign key when manipulating the field it refers to. Manipulations can be of the following ON DELETEtypes ON UPDATE: Possible behavior options:
  • CASCADE— with this property, rows from the dependent table will be automatically deleted or changed when related rows are deleted or changed in the main table;
  • SET NULL— with this property, when a related row is deleted or updated from the main table, the value NULLfor the foreign key column will be set;
  • NO ACTION— rejects attempts to delete or change rows in the main table if there are related rows in the dependent table;
  • RESTRICT- equivalent to NO ACTION;
  • SET DEFAULT- with this property, when a related row is deleted or updated from the main table, the default value (if any) for the foreign key column will be set.
Usage example:
CREATE TABLE table_name{
column_name int,..
FOREIGN KEY(column_name) REFERENCES another_table_name(another_table_column_name) ON UPDATE CASCADE ON DELETE CASCADE }
ON DELETEIf the behavior for and is not explicitly set ON UPDATE, the behavior will be set to RESTRICT.

10. Types of connections between tables (Join)

The connection between tables is provided based on common data (fields). This happens using an operator JOIN, an operation that matches rows from one table with rows in another. The mapping is done so that the columns of both tables are adjacent, although they can be obtained from separate tables. And if we have common fields for three tables, we can display their data as one common table. However, it is worth considering that the fewer tables that are joined, the faster the query will run. So, the types JOIN:
  • INNER JOIN- a connection that shows only those data from the first table that correspond to some data from the second table. The rest go down.Developer interview: analysis of database questions - 4
  • LEFT JOIN- a connection that shows all the data from the first table and the corresponding data from the second, if any. If there is no corresponding data, the fields for data from the second table will be empty.Developer interview: analysis of database questions - 5
  • RIGHT JOIN- a connection that shows all the data from the second table and the corresponding data from the first, if any. If there is no corresponding data, the fields for the data from the first table will be empty.Developer interview: analysis of database questions - 6
  • FULL JOIN- a connection that shows all the data from the first and second tables. If there is no related data in the other table, the fields for that data will be empty.Developer interview: analysis of database questions - 7
  • CROSS JOIN- a cross join in which each row of the first table is joined to each row of the second table (each to each). That is, if two tables have 3 rows each, after this join we will get a result of 9 rows.Developer interview: analysis of database questions - 8
Example Join(inner):
SELECT *
FROM first_table
INNER JOIN second_table ON first_table.some_column = second_table.some_column

11. What is an ACID property in a database?

A - Atomicity , ensures that no transaction is partially committed to the system. Either all of its suboperations are performed, or none. For example, transferring money from a bank to another account involves two operations:
  1. Transfer money to a bank account.
  2. Transfer money from a bank account to a specific account.
But anything can happen. For example, they will go to the bank, and then some error will occur and the second operation will not be completed. Or vice versa: only the second operation will be performed. Therefore, these actions are carried out within one transaction, and the result is either all or nothing. C - Consistency : Each successful transaction always records only resolvable results. This ensures that all restrictions are met (for example, NOT NULL), otherwise the transaction will be rolled back. And - isolation : during the execution of a transaction, parallel transactions should not affect its result. This gives us the ability to hide non-final data states from everyone. Actually, this is why unsuccessful transactions cannot break anything. A little lower we will get acquainted with transaction isolation levels. D - Durability : If a transaction is completed, then you can be sure that the changes it made will not be canceled due to some failure.

12. Transaction isolation levels

Each isolation level allows/prohibits certain actions (opportunities):
  • phantom reading - within the same transaction, the same data request gives different results, which occurs due to the addition of data by another (parallel) transaction.
  • non-repeating read - within the same transaction, the same data request gives different results, which occurs due to the change or deletion of data by another (parallel) transaction.
  • "dirty" read - reading data added or changed by a transaction that will not subsequently be rolled back;
  • lost update - when different transactions change the same block of data at the same time, all changes except the last one are lost (similar to a “race condition” in multithreading).
For convenience, we consider the isolation levels and their capabilities in the table:
Insulation levels Phantom reading Non-repetitive reading “dirty” reading lost update
SERIALIZABLE + + + +
REPEATABLE_READ - + + +
READ_COMMITTED - - + +
READ_UNCOMMITTED - - - +

13. What is SQL injection?

SQL injection is one of the methods of hacking a website, the essence of which is the injection of some SQL code into the data through GETqueries POSTor Cookies. If a website performs such injections, it is possible to gain access to the database and hack the application. For example, we know the name of some variable. Let's say column_namewith type boolean. If the system is susceptible to injections, we can add OR column_name=trueand then write everything we need from the database. ORwill create an OR condition, and our expression after it will always be true, which will take us further. An attack on a website like SQL injection is possible due to improper processing of incoming data used in SQL queries. When connecting to a database using JDBC , you use various Statements. To increase security, it is necessary to use PreparedStatementinstead of the usual one Statement, since when used, Statementthe query strings and values ​​are simply added together, making injections possible. In turn, there PreparedStatementis a specific request template, and the data is inserted into it with the quotation marks reflected. As a result, SQL injections will be perceived only as a string representation of some field. To protect against SQL injections, you can use checks based on regular expressions (you can read more about regular expressions in this article ). Developer interview: analysis of database questions - 9Another option is to set a limit on the number of characters of incoming parameters: for example, if you should receive a number no more than 9999, a limit of four incoming characters will do. It will reduce the risk of hacking using SQL injections. You can learn more about security in Java from the article “Security in Java: best practices” .

14. What are stored procedures? Stored functions? Trigger?

Stored procedures in SQL are an entity in the database, which is a set of SQL instructions that is compiled once and stored on the server. In a word, this is an analogue of methods in Java. Stored procedures can perform actions on data, both normal queries and some actions that are not available for normal queries. A procedure is an SQL entity that is created once and then called by passing arguments. The advantage of this approach is that these instructions can be reused more than once. Stored procedures improve performance, enhance programming capabilities, and support data security features. Let's consider creating a procedure:
CREATE PROCEDURE procedure_name (first_param some_type, second_param some_type..)
 begin
……...
 end
Calling the procedure:
CALL procedure_name (first_param, second_param…..);
A stored function is a type of stored procedure. The difference between a function is that it always returns only a single value, while a procedure always returns a set of values. Stored procedures cannot be mixed with regular SQL, while a stored function can - and this is its advantage. On the other hand, stored functions have many more limitations than procedures. Creating a stored function:
CREATE FUNCTION function_name (first_param, second_param…..)
RETURNS some_type
 begin
……...
RETURN some_value;
end
Calling a stored function:
SELECT function_name(first_param, second_param…..);
A trigger is another type of stored procedure that is not called directly by the user, but is activated when data is modified. That is, this procedure is activated when certain conditions are met, such as, INSERTor DELETE, or UPDATEdata in a certain column of a given table. When a trigger is fired is determined using the keywords BEFORE(trigger fires before the associated event) or AFTER(after the event).
CREATE TRIGGER trigger_name
ON table_name
AFTER INSERT
 begin
……...
 end

15. Practice

Be that as it may, the most common SQL question at an interview will be practice - problem solving. There is no point in trying to guess which tasks you will come across, because everything depends on the sophistication of the imagination of the person opposite. Therefore, the only working option would be to get better at SQL queries of varying complexity. sql-ex.ru can serve as a resource for practicing on various tasks . After the first twenty completed tasks, it will be quite difficult for your interlocutor to scare you with any SQL task. Developer interview: analysis of database questions - 11That’s all for today: I hope that after reading this article, questions about databases will not cause any difficulties or problems. Thank you for your attention and see you again!
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION