JavaRush /Java Blog /Random EN /XML Basics for the Java Programmer. Part 3.2 of 3 - DOM

Level 40

Днепр

8 August 2023
217 views
0 comments

XML Basics for the Java Programmer. Part 3.2 of 3 - DOM

<h2>Introduction</h2>Hello to all readers of this article, this part is about DOM. The next one will be devoted to JAXB and, with this, the cycle of XML basics will be completed. First there will be a little theory, and then only practice. Let's get started.

<h2>DOM (Document Object Model) - THEORY</h2>The DOM processor is designed in such a way that it reads all the XML at once and saves it, creating a hierarchy in the form of a tree, along which we can safely move and access the elements we need .

Thus, we can, having a link to the top element, get all the links to its inner elements. Moreover, the elements that are inside the element are the children of this element, and it is their parent. Once we've read the entire XML into memory, we'll simply traverse its structure and perform the actions we need. A bit about the Java DOM programming part: the DOM has a lot of interfaces that are designed to describe different data. All these interfaces inherit one common interface - Node (node). Because, in fact, the most common data type in the DOM is Node (node), which can be anything. Each Node has the following useful methods for extracting information:

getNodeName– get the node name.
getNodeValue– get the value of the node.
getNodeType– get node type.
getParentNode– get the node in which the given node is located.
getChildNodes– get all derived nodes (nodes that are inside the given node).
getAttributes– get all attributes of a node.
getOwnerDocument– get the document of this node.
getFirstChild/getLastChild– get the first/last derived node.
getLocalName- useful when processing namespaces to get a name without a prefix.
getTextContent- returns all text inside the element and all elements inside the given element, including line breaks and spaces.

Note on method 9: it will always return null unless you use the setNamespaceAware(true) method in the DocumentFactory to start processing namespaces. Now, an important detail: methods are common for all Node, but in Node we can have both an element and an attribute. And here the questions are: what value can an element have? What derived nodes can an attribute have? And others are inconsistent. And everything is quite simple: each method will work depending on the type of Node . It is enough to use logic, of course, so as not to get confused. For example: what attributes can attributes have? What is the value of the element? However, in order not to try everything yourself, there is a very useful table in the official docs on the operation of each method depending on the type of Node:

The quality turned out to be bad, so the link to the documentation (table at the top of the page): Node Documentation The most important thing to remember:

Attributes are ONLY for elements.
Elements have NO value.
The name of the element node is the same as the name of the tag, and the name of the attribute node is the same as the name of the attribute.

<h2>DOM (Document Object Model) - PRACTICE</h2>In the practical part, we will analyze various kinds of tasks for finding information in XML. We also took two tasks from the last article for convenience comparison. Let's start, and it would be good to start with imports:

import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.File;
import java.io.IOException;

I give imports so that you do not confuse the classes :) Task number 1 - we need to get information about all employees and output it to the console from the following XML file:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<company>
    <name>IT-Heaven</name>
    <offices>
        <office floor="1" room="1">
            <employees>
                <employee name="Maksim" job="Middle Software Developer" />
                <employee name="Ivan" job="Junior Software Developer" />
                <employee name="Franklin" job="Junior Software Developer" />
            </employees>
        </office>
        <office floor="1" room="2">
            <employees>
                <employee name="Herald" job="Middle Software Developer" />
                <employee name="Adam" job="Middle Software Developer" />
                <employee name="Leroy" job="Junior Software Developer" />
            </employees>
        </office>
    </offices>
</company>

As we can see, we have all the information stored in the employee elements. In order to store it somewhere in our program, let's create a class Employee:

public class Employee {
    private String name, job;

    public Employee(String name, String job) {
        this.name = name;
        this.job = job;
    }

    public String getName() {
        return name;
    }

    public String getJob() {
        return job;
    }
}

Now that we have a description of the structure for storing data, we need a collection that will store employees. We will create it in the code itself. We also need to create a Document based on our XML:

public class DOMExample {
    // Список для сотрудников из XML file
    private static ArrayList<Employee> employees = new ArrayList<>();

    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
        // Получение фабрики, чтобы после получить билдер documentов.
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

        // Получor из фабрики билдер, который парсит XML, создает структуру Document в виде иерархического дерева.
        DocumentBuilder builder = factory.newDocumentBuilder();

        // Запарсor XML, создав структуру Document. Теперь у нас есть доступ ко всем elementм, Howим нам нужно.
        Document document = builder.parse(new File("resource/xml_file1.xml"));
    }
}

After receiving the document, we have unlimited power over the entire structure of the XML file. We can get any elements at any time, come back to check any data, and in general, a more flexible approach than we had in SAX. In the context of this task, we just need to extract all the employee elements, and then extract all the information about them. It's simple enough:

public class DOMExample {
    // Список для сотрудников из XML file
    private static ArrayList<Employee> employees = new ArrayList<>();

    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
        // Получение фабрики, чтобы после получить билдер documentов.
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

        // Получor из фабрики билдер, который парсит XML, создает структуру Document в виде иерархического дерева.
        DocumentBuilder builder = factory.newDocumentBuilder();

        // Запарсor XML, создав структуру Document. Теперь у нас есть доступ ко всем elementм, Howим нам нужно.
        Document document = builder.parse(new File("resource/xml_file1.xml"));

        // Получение списка всех элементов employee внутри корневого element (getDocumentElement возвращает ROOT элемент XML file).
        NodeList employeeElements = document.getDocumentElement().getElementsByTagName("employee");

        // Перебор всех элементов employee
        for (int i = 0; i < employeeElements.getLength(); i++) {
            Node employee = employeeElements.item(i);

            // Получение атрибутов каждого element
            NamedNodeMap attributes = employee.getAttributes();

            // Добавление сотрудника. Атрибут - тоже Node, потому нам нужно получить meaning атрибута с помощью метода getNodeValue()
            employees.add(new Employee(attributes.getNamedItem("name").getNodeValue(), attributes.getNamedItem("job").getNodeValue()));
        }

        // Вывод информации о каждом сотруднике
        for (Employee employee : employees)
            System.out.println(String.format("Информации о сотруднике: Name - %s, должность - %s.", employee.getName(), employee.getJob()));
    }
}

The description of this solution is right in the solution. It is desirable after viewing the code to go back to the theory and read it again. In fact, everything is understandable instinctively. Read the comments carefully and there should be no questions, and if you have any left, you can write in the comments, I will answer, or just run your IDEA and try to play around with the code yourself if you haven't already. So after running the code, we got the following output:

Информации о сотруднике: Name - Maksim, должность - Middle Software Developer.
Информации о сотруднике: Name - Ivan, должность - Junior Software Developer.
Информации о сотруднике: Name - Franklin, должность - Junior Software Developer.
Информации о сотруднике: Name - Herald, должность - Middle Software Developer.
Информации о сотруднике: Name - Adam, должность - Middle Software Developer.
Информации о сотруднике: Name - Leroy, должность - Junior Software Developer.

As you can see, the task was successfully completed! Let's move on to the next task :) Task number 2 - the name of the element is entered from the console, about which you need to display information about all the elements inside it and their attributes from the following XML file:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <oracle>
        <connection value="jdbc:oracle:thin:@10.220.140.48:1521:test1" />
        <user value="secretOracleUsername" />
        <password value="111" />
    </oracle>

    <mysql>
        <connection value="jdbc:mysql:thin:@10.220.140.48:1521:test1" />
        <user value="secretMySQLUsername" />
        <password value="222" />
    </mysql>
</root>

Everything is quite simple: we must get the element by its name, which we consider, and then go through all the child nodes. To do this, iterate over all child nodes of all child nodes that are elements. Solution to this problem:

public class DOMExample {
    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
        // Ридер для считывания имени тега из консоли
        BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));

        // Получение фабрики, чтобы после получить билдер documentов.
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

        // Получor из фабрики билдер, который парсит XML, создает структуру Document в виде иерархического дерева.
        DocumentBuilder builder = factory.newDocumentBuilder();

        // Запарсor XML, создав структуру Document. Теперь у нас есть доступ ко всем elementм, Howим нам нужно.
        Document document = builder.parse(new File("resource/xml_file3.xml"));

        // Считывание имени тега для поиска его в файле
        String element = reader.readLine();

        // Получение списка элементов, однако для удобства будем рассматривать только первое совпадение в documentе.
        // Так же заметьте, что мы ищем элемент внутри documentа, а не рут element. Это сделано для того, чтобы рут элемент тоже искался.
        NodeList matchedElementsList = document.getElementsByTagName(element);

        // Даже если element нет, всегда будет возвращаться список, просто он будет пустым.
        // Потому, чтобы утверждать, что element нет в файле, достаточно проверить размер списка.
        if (matchedElementsList.getLength() == 0) {
            System.out.println("Tag " + element + " не был найден в файле.");
        } else {
            // Получение первого element.
            Node foundedElement = matchedElementsList.item(0);

            System.out.println("Элемент был найден!");

            // Если есть данные внутри, вызов метода для вывода всей информации
            if (foundedElement.hasChildNodes())
                printInfoAboutAllChildNodes(foundedElement.getChildNodes());
        }
    }

    /**
     * Рекурсивный метод, который будет выводить информацию про все узлы внутри всех узлов, которые пришли параметром, пока не будут перебраны все узлы.
     * @param list Список узлов.
     */
    private static void printInfoAboutAllChildNodes(NodeList list) {
        for (int i = 0; i < list.getLength(); i++) {
            Node node = list.item(i);

            // У элементов есть два вида узлов - другие элементы or текстовая информация. Потому нужно разбираться две ситуации отдельно.
            if (node.getNodeType() == Node.TEXT_NODE) {
                // Фильтрация информации, так How пробелы и переносы строчек нам не нужны. Это не информация.
                String textInformation = node.getNodeValue().replace("\n", "").trim();

                if(!textInformation.isEmpty())
                    System.out.println("Внутри element найден текст: " + node.getNodeValue());
            }
            // Если это не текст, а элемент, то обрабатываем его How элемент.
            else {
                System.out.println("Найден элемент: " + node.getNodeName() + ", его атрибуты:");

                // Получение атрибутов
                NamedNodeMap attributes = node.getAttributes();

                // Вывод информации про все атрибуты
                for (int k = 0; k < attributes.getLength(); k++)
                    System.out.println("Name атрибута: " + attributes.item(k).getNodeName() + ", его meaning: " + attributes.item(k).getNodeValue());
            }

            // Если у данного element еще остались узлы, то вывести всю информацию про все его узлы.
            if (node.hasChildNodes())
                printInfoAboutAllChildNodes(node.getChildNodes());
        }
    }
}

The entire description of the solution is in the comments, but I would like to graphically depict the approach that we used, using the example of a picture from the theory.

We will assume that we need to display information about the html tag. As you can see, we need to go from top to bottom from the root of the tree. All lines are nodes.

In the solution, we will recursively go from the beginning of the desired element through all its nodes, and if one of its nodes is an element, then we will also iterate over all the nodes of this element. Thus, after running the code, we got the following output for the root element:

Элемент был найден!
Найден элемент: oracle, его атрибуты:
Найден элемент: connection, его атрибуты:
Name атрибута: value, его meaning: jdbc:oracle:thin:@10.220.140.48:1521:test1
Найден элемент: user, его атрибуты:
Name атрибута: value, его meaning: secretOracleUsername
Найден элемент: password, его атрибуты:
Name атрибута: value, его meaning: 111
Найден элемент: mysql, его атрибуты:
Найден элемент: connection, его атрибуты:
Name атрибута: value, его meaning: jdbc:mysql:thin:@10.220.140.48:1521:test1
Найден элемент: user, его атрибуты:
Name атрибута: value, его meaning: secretMySQLUsername
Найден элемент: password, его атрибуты:
Name атрибута: value, его meaning: 222

Task successfully solved! Task number 3 - from the following XML file, where information about students, professors and employees is stored, you need to read the information and output it to the console:

<?xml version="1.0" encoding="UTF-8"?>
<database>
    <students>
        <student name="Maksim" course="3" specialization="CE" />
        <student name="Stephan" course="1" specialization="CS" />
        <student name="Irvin" course="2" specialization="CE" />
    </students>

    <professors>
        <professor name="Herald" experience="7 years in University" discipline="Math" />
        <professor name="Adam" experience="4 years in University" discipline="Programming" />
        <professor name="Anton" experience="6 years in University" discipline="English" />
    </professors>

    <service>
        <member name="John" position="janitor" />
        <member name="Jordan" position="janitor" />
        <member name="Mike" position="janitor" />
    </service>
</database>

The task is quite simple, but interesting. To begin with, we need to create 4 classes: an employee, a professor and a student, as well as a common abstract class Human, in order to bring the name variable from each class to a common denominator: The abstract parent class

public abstract class Human {
    private String name;

    public Human(String name) {
        this.name = name;
    }

    public String getName() {
        return name;
    }
}

Student

public class Student extends Human {
    private String course, specialization;

    public Student(String name, String course, String specialization) {
        super(name);
        this.course = course;
        this.specialization = specialization;
    }

    public String getCourse() {
        return course;
    }

    public String getSpecialization() {
        return specialization;
    }

    public String toString() {
        return "Голодный студент " + getName() + " " + course + "-го курса, обучающийся по специальности " + specialization;
    }
}

Professor

public class Professor extends Human {
    private String experience, discipline;

    public Professor(String name, String experience, String discipline) {
        super(name);
        this.experience = experience;
        this.discipline = discipline;
    }

    public String getExperience() {
        return experience;
    }

    public String getDiscipline() {
        return discipline;
    }

    public String toString() {
        return "Профессор " + getName() + ", обладающий опытом: \"" + experience + "\", выкладает дисциплину " + discipline;
    }
}

Employee

public class Member extends Human {
    private String position;

    public Member(String name, String position) {
        super(name);
        this.position = position;
    }

    public String getPosition() {
        return position;
    }

    public String toString() {
        return "Сотрудник обслуживающего персонала " + getName() + ", должность: " + position;
    }
}

Now that our classes are ready, all we have to do is write code to get all of the student, professor, and member elements, and then get their attributes. For storage, we will use a collection that will store objects of the parent class common to all - Human. And so, the solution to this problem:

public class DOMExample {
    // Коллекция для хранения всех людей
    private static ArrayList<Human> humans = new ArrayList<>();

    // Константы для элементов
    private static final String PROFESSOR = "professor";
    private static final String MEMBER = "member";
    private static final String STUDENT = "student";

    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
        // Получение фабрики, чтобы после получить билдер documentов.
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

        // Получor из фабрики билдер, который парсит XML, создает структуру Document в виде иерархического дерева.
        DocumentBuilder builder = factory.newDocumentBuilder();

        // Запарсor XML, создав структуру Document. Теперь у нас есть доступ ко всем elementм, Howим нам нужно.
        Document document = builder.parse(new File("resource/xml_file3.xml"));

        // Получение информации про каждый элемент отдельно
        collectInformation(document, PROFESSOR);
        collectInformation(document, MEMBER);
        collectInformation(document, STUDENT);

        // Вывод информации
        humans.forEach(System.out::println);
    }

    /**
     * Метод ищет информацию про теги по имени element и вносит всю информацию в коллекцию humans.
     * @param document Документ, в котором будем искать элементы.
     * @param element Name element, теги которого нужно найти. Должна быть одна из констант, которые определяются выше.
     */
    private static void collectInformation(Document document, final String element) {
        // Получение всех элементов по имени тега.
        NodeList elements = document.getElementsByTagName(element);

        // Перебор всех найденных элементов
        for (int i = 0; i < elements.getLength(); i++) {
            // Получение всех атрибутов element
            NamedNodeMap attributes = elements.item(i).getAttributes();
            String name = attributes.getNamedItem("name").getNodeValue();

            // В зависимости от типа element, нам нужно собрать свою дополнительну информацию про каждый подкласс, а после добавить нужные образцы в коллекцию.
            switch (element) {
                case PROFESSOR: {
                    String experience = attributes.getNamedItem("experience").getNodeValue();
                    String discipline = attributes.getNamedItem("discipline").getNodeValue();

                    humans.add(new Professor(name, experience, discipline));
                } break;
                case STUDENT: {
                    String course = attributes.getNamedItem("course").getNodeValue();
                    String specialization = attributes.getNamedItem("specialization").getNodeValue();

                    humans.add(new Student(name, course, specialization));
                } break;
                case MEMBER: {
                    String position = attributes.getNamedItem("position").getNodeValue();

                    humans.add(new Member(name, position));
                } break;
            }
        }
    }
}

Note that we only need the name of the element to get all of these elements from the document in general. This greatly simplifies the process of finding the information you need. All information about the code is placed in the comments. Nothing new was used that was not in the previous tasks. Code output:

Профессор Herald, обладающий опытом: "7 years in University", выкладает дисциплину Math
Профессор Adam, обладающий опытом: "4 years in University", выкладает дисциплину Programming
Профессор Anton, обладающий опытом: "6 years in University", выкладает дисциплину English
Сотрудник обслуживающего персонала John, должность: janitor
Сотрудник обслуживающего персонала Jordan, должность: janitor
Сотрудник обслуживающего персонала Mike, должность: janitor
Голодный студент Maksim 3-го курса, обучающийся по специальности CE
Голодный студент Stephan 1-го курса, обучающийся по специальности CS
Голодный студент Irvin 2-го курса, обучающийся по специальности CE

Problem solved! Recommendations when to use DOM and when SAX The difference between these tools is in functionality and speed. If you need more flexible functionality and can afford to waste program performance, then DOM is your choice, if your main goal is to reduce memory costs, then DOM is not the best choice, since it reads all the information from the XML file and saves it. Therefore, the SAX sequential read method is less expensive. In short: if you need performance - SAX, functionality - DOM. <h2>Conclusion</h2>Each programmer has his own tools, and, depending on the task, you need to use certain tools. In my articles about SAX and DOM, I aimed to teach you how to extract information from XML files and process them the way you need it. However, even if you have read these articles, you cannot claim to have learned how to use these tools. You should practice, test the code from the articles, understand how it works and try to write something yourself. After all, the most important thing is practice. The last article will be in the coming days and, apparently, already at the end of the competition, and will be devoted to JAXB. JAXB is a tool for storing objects in your program in XML format. That's all, I hope this article was useful, and good luck with your programming :) Previous article:[Contest] XML Basics for Java Programmers - Part 3.1 of 3 - SAX

Comments

TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION