JavaRush /Java Blog /Random EN /XML Basics for Java Programmer. Part 3.2 of 3 - DOM
Ярослав
Level 40
Днепр

XML Basics for Java Programmer. Part 3.2 of 3 - DOM

Published in the Random EN group
<h2>Introduction</h2>Hello to all readers of the article, this part is dedicated to the DOM. The next one will be devoted to JAXB and, with this, the cycle of XML basics will be completed. First there will be a little theory, and then only practice. Let's get started. <h2>DOM (Document Object Model) - THEORY</h2>The DOM handler is designed in such a way that it reads all XML at once and saves it, creating a hierarchy in the form of a tree through which we can easily move and access the elements we need . Thus, we can, given a link to the top element, get all links to its inner elements. Moreover, the elements that are inside the element are the children of this element, and it is their parent. Once we've read all the XML into memory, we'll simply travel through its structure and perform the actions we need. A little about the programming part of the DOM in Java: the DOM has many interfaces that are created to describe various data. All these interfaces inherit one common interface - Node. Because, in fact, the most common data type in the DOM is Node, which can be anything. Each Node has the following useful methods for retrieving information:
  1. getNodeName– get the host name.
  2. getNodeValue– get the node value.
  3. getNodeType– get the node type.
  4. getParentNode– get the node within which the given node is located.
  5. getChildNodes– get all derived nodes (nodes that are inside a given node).
  6. getAttributes– get all node attributes.
  7. getOwnerDocument– get the document of this node.
  8. getFirstChild/getLastChild– get the first/last derived node.
  9. getLocalName– useful when processing namespaces to get a name without a prefix.
  10. getTextContent– returns all text within an element and all elements within a given element, including line breaks and spaces.
Note on method 9: it will always return null unless you have used the setNamespaceAware(true) method in the DocumentFactory to trigger namespace processing. Now, an important detail: the methods are common to all Nodes, but in Node we can have both an element and an attribute. And here are the questions: what value can an element have? What derived nodes can an attribute have? And others are not consistent. And everything is quite simple: each method will work depending on the Node type . It is enough to use logic, of course, so as not to get confused. For example: what attributes are attributes capable of having? What other meaning does the element have? However, in order not to try everything yourself, in the official docs there is a very useful table on how each method works depending on the Node type: The quality turned out to be bad, so here is a link to the documentation (table at the top of the page): Node Documentation The most important thing to remember:
  1. ONLY elements have attributes.
  2. Elements have NO meaning.
  3. The name of the element node is the same as the name of the tag, and the name of the attribute node is the same as the name of the attribute.
<h2>DOM (Document Object Model) - PRACTICE</h2>In the practical part, we will analyze various types of tasks on searching for information in XML. We also took two tasks from the previous article to compare convenience. Let's get started, and it would be good to start with imports:
import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.File;
import java.io.IOException;
I provide imports so that you do not confuse the classes :) Task No. 1 - we need to get information about all employees and output it to the console from the following XML file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<company>
    <name>IT-Heaven</name>
    <offices>
        <office floor="1" room="1">
            <employees>
                <employee name="Maksim" job="Middle Software Developer" />
                <employee name="Ivan" job="Junior Software Developer" />
                <employee name="Franklin" job="Junior Software Developer" />
            </employees>
        </office>
        <office floor="1" room="2">
            <employees>
                <employee name="Herald" job="Middle Software Developer" />
                <employee name="Adam" job="Middle Software Developer" />
                <employee name="Leroy" job="Junior Software Developer" />
            </employees>
        </office>
    </offices>
</company>
As we can see, we have all the information stored in the employee elements. In order to store it somewhere in our program, let's create a class Employee:
public class Employee {
    private String name, job;

    public Employee(String name, String job) {
        this.name = name;
        this.job = job;
    }

    public String getName() {
        return name;
    }

    public String getJob() {
        return job;
    }
}
Now that we have a description of the structure for storing data, we need a collection that will store employees. We will create it in the code itself. We also need to create a Document based on our XML:
public class DOMExample {
    // Список для сотрудников из XML file
    private static ArrayList<Employee> employees = new ArrayList<>();

    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
        // Получение фабрики, чтобы после получить билдер documentов.
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

        // Получor из фабрики билдер, который парсит XML, создает структуру Document в виде иерархического дерева.
        DocumentBuilder builder = factory.newDocumentBuilder();

        // Запарсor XML, создав структуру Document. Теперь у нас есть доступ ко всем elementм, Howим нам нужно.
        Document document = builder.parse(new File("resource/xml_file1.xml"));
    }
}
Once we receive the document, we have unlimited power over the entire structure of the XML file. We can fetch any elements at any time, go back to check any data and, in general, a more flexible approach than we had in SAX. In the context of this task, we just need to extract all the employee elements, and then extract all the information about them. It's quite simple:
public class DOMExample {
    // Список для сотрудников из XML file
    private static ArrayList<Employee> employees = new ArrayList<>();

    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
        // Получение фабрики, чтобы после получить билдер documentов.
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

        // Получor из фабрики билдер, который парсит XML, создает структуру Document в виде иерархического дерева.
        DocumentBuilder builder = factory.newDocumentBuilder();

        // Запарсor XML, создав структуру Document. Теперь у нас есть доступ ко всем elementм, Howим нам нужно.
        Document document = builder.parse(new File("resource/xml_file1.xml"));

        // Получение списка всех элементов employee внутри корневого element (getDocumentElement возвращает ROOT элемент XML file).
        NodeList employeeElements = document.getDocumentElement().getElementsByTagName("employee");

        // Перебор всех элементов employee
        for (int i = 0; i < employeeElements.getLength(); i++) {
            Node employee = employeeElements.item(i);

            // Получение атрибутов каждого element
            NamedNodeMap attributes = employee.getAttributes();

            // Добавление сотрудника. Атрибут - тоже Node, потому нам нужно получить meaning атрибута с помощью метода getNodeValue()
            employees.add(new Employee(attributes.getNamedItem("name").getNodeValue(), attributes.getNamedItem("job").getNodeValue()));
        }

        // Вывод информации о каждом сотруднике
        for (Employee employee : employees)
            System.out.println(String.format("Информации о сотруднике: Name - %s, должность - %s.", employee.getName(), employee.getJob()));
    }
}
The description of this solution is right in the solution. It is advisable, after viewing the code, to return back to the theory and read it again. In fact, everything is clear instinctively. Read the comments carefully and there shouldn’t be any questions, and if there are any, you can write in the comments, I’ll answer, or in the link, or just run your IDEA and try to play with the code yourself if you haven’t done so yet. So after running the code we got the following output:
Информации о сотруднике: Name - Maksim, должность - Middle Software Developer.
Информации о сотруднике: Name - Ivan, должность - Junior Software Developer.
Информации о сотруднике: Name - Franklin, должность - Junior Software Developer.
Информации о сотруднике: Name - Herald, должность - Middle Software Developer.
Информации о сотруднике: Name - Adam, должность - Middle Software Developer.
Информации о сотруднике: Name - Leroy, должность - Junior Software Developer.
As you can see, the task was successfully completed! Let's move on to the next task :) Task No. 2 - the name of an element is entered from the console, about which you need to display information about all the elements inside it and their attributes from the following XML file:
<?xml version="1.0" encoding="UTF-8"?>
<root>
    <oracle>
        <connection value="jdbc:oracle:thin:@10.220.140.48:1521:test1" />
        <user value="secretOracleUsername" />
        <password value="111" />
    </oracle>

    <mysql>
        <connection value="jdbc:mysql:thin:@10.220.140.48:1521:test1" />
        <user value="secretMySQLUsername" />
        <password value="222" />
    </mysql>
</root>
Everything is quite simple: we must get the element by its name, which we count, and then go through all the child nodes. To do this, you need to iterate through all the child nodes of all the child nodes that are elements. Solution to this problem:
public class DOMExample {
    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
        // Ридер для считывания имени тега из консоли
        BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));

        // Получение фабрики, чтобы после получить билдер documentов.
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

        // Получor из фабрики билдер, который парсит XML, создает структуру Document в виде иерархического дерева.
        DocumentBuilder builder = factory.newDocumentBuilder();

        // Запарсor XML, создав структуру Document. Теперь у нас есть доступ ко всем elementм, Howим нам нужно.
        Document document = builder.parse(new File("resource/xml_file3.xml"));

        // Считывание имени тега для поиска его в файле
        String element = reader.readLine();

        // Получение списка элементов, однако для удобства будем рассматривать только первое совпадение в documentе.
        // Так же заметьте, что мы ищем элемент внутри documentа, а не рут element. Это сделано для того, чтобы рут элемент тоже искался.
        NodeList matchedElementsList = document.getElementsByTagName(element);

        // Даже если element нет, всегда будет возвращаться список, просто он будет пустым.
        // Потому, чтобы утверждать, что element нет в файле, достаточно проверить размер списка.
        if (matchedElementsList.getLength() == 0) {
            System.out.println("Tag " + element + " не был найден в файле.");
        } else {
            // Получение первого element.
            Node foundedElement = matchedElementsList.item(0);

            System.out.println("Элемент был найден!");

            // Если есть данные внутри, вызов метода для вывода всей информации
            if (foundedElement.hasChildNodes())
                printInfoAboutAllChildNodes(foundedElement.getChildNodes());
        }
    }

    /**
     * Рекурсивный метод, который будет выводить информацию про все узлы внутри всех узлов, которые пришли параметром, пока не будут перебраны все узлы.
     * @param list Список узлов.
     */
    private static void printInfoAboutAllChildNodes(NodeList list) {
        for (int i = 0; i < list.getLength(); i++) {
            Node node = list.item(i);

            // У элементов есть два вида узлов - другие элементы or текстовая информация. Потому нужно разбираться две ситуации отдельно.
            if (node.getNodeType() == Node.TEXT_NODE) {
                // Фильтрация информации, так How пробелы и переносы строчек нам не нужны. Это не информация.
                String textInformation = node.getNodeValue().replace("\n", "").trim();

                if(!textInformation.isEmpty())
                    System.out.println("Внутри element найден текст: " + node.getNodeValue());
            }
            // Если это не текст, а элемент, то обрабатываем его How элемент.
            else {
                System.out.println("Найден элемент: " + node.getNodeName() + ", его атрибуты:");

                // Получение атрибутов
                NamedNodeMap attributes = node.getAttributes();

                // Вывод информации про все атрибуты
                for (int k = 0; k < attributes.getLength(); k++)
                    System.out.println("Name атрибута: " + attributes.item(k).getNodeName() + ", его meaning: " + attributes.item(k).getNodeValue());
            }

            // Если у данного element еще остались узлы, то вывести всю информацию про все его узлы.
            if (node.hasChildNodes())
                printInfoAboutAllChildNodes(node.getChildNodes());
        }
    }
}
The entire description of the solution is in the comments, but I would like to illustrate a little graphically the approach that we used, using an example from a picture from the theory. We will assume that we need to display information about the html tag. As you can see, we need to go from top to bottom from the root of the tree. All lines are nodes. In the solution, we will recursively go from the beginning of the desired element through all its nodes, and if one of its nodes is an element, then we also iterate through all the nodes of this element. So after running the code we got the following output for the root element:
Элемент был найден!
Найден элемент: oracle, его атрибуты:
Найден элемент: connection, его атрибуты:
Name атрибута: value, его meaning: jdbc:oracle:thin:@10.220.140.48:1521:test1
Найден элемент: user, его атрибуты:
Name атрибута: value, его meaning: secretOracleUsername
Найден элемент: password, его атрибуты:
Name атрибута: value, его meaning: 111
Найден элемент: mysql, его атрибуты:
Найден элемент: connection, его атрибуты:
Name атрибута: value, его meaning: jdbc:mysql:thin:@10.220.140.48:1521:test1
Найден элемент: user, его атрибуты:
Name атрибута: value, его meaning: secretMySQLUsername
Найден элемент: password, его атрибуты:
Name атрибута: value, его meaning: 222
The problem has been successfully solved! Task No. 3 – from the following XML file, where information about students, professors and employees is saved, you need to read the information and output it to the console:
<?xml version="1.0" encoding="UTF-8"?>
<database>
    <students>
        <student name="Maksim" course="3" specialization="CE" />
        <student name="Stephan" course="1" specialization="CS" />
        <student name="Irvin" course="2" specialization="CE" />
    </students>

    <professors>
        <professor name="Herald" experience="7 years in University" discipline="Math" />
        <professor name="Adam" experience="4 years in University" discipline="Programming" />
        <professor name="Anton" experience="6 years in University" discipline="English" />
    </professors>

    <service>
        <member name="John" position="janitor" />
        <member name="Jordan" position="janitor" />
        <member name="Mike" position="janitor" />
    </service>
</database>
The task is quite simple, but interesting. First, we need to create 4 classes: employee, professor and student, as well as a common abstract class Human in order to bring the name variable from each class under a common denominator: Abstract parent class
public abstract class Human {
    private String name;

    public Human(String name) {
        this.name = name;
    }

    public String getName() {
        return name;
    }
}
Student
public class Student extends Human {
    private String course, specialization;

    public Student(String name, String course, String specialization) {
        super(name);
        this.course = course;
        this.specialization = specialization;
    }

    public String getCourse() {
        return course;
    }

    public String getSpecialization() {
        return specialization;
    }

    public String toString() {
        return "Голодный студент " + getName() + " " + course + "-го курса, обучающийся по специальности " + specialization;
    }
}
Professor
public class Professor extends Human {
    private String experience, discipline;

    public Professor(String name, String experience, String discipline) {
        super(name);
        this.experience = experience;
        this.discipline = discipline;
    }

    public String getExperience() {
        return experience;
    }

    public String getDiscipline() {
        return discipline;
    }

    public String toString() {
        return "Профессор " + getName() + ", обладающий опытом: \"" + experience + "\", выкладает дисциплину " + discipline;
    }
}
Employee
public class Member extends Human {
    private String position;

    public Member(String name, String position) {
        super(name);
        this.position = position;
    }

    public String getPosition() {
        return position;
    }

    public String toString() {
        return "Сотрудник обслуживающего персонала " + getName() + ", должность: " + position;
    }
}
Now that our classes are ready, we just need to write code to get all the elements student, professor and member, and then get their attributes. For storage, we will use a collection that will store objects of the parent class common to all - Human. And so, the solution to this problem:
public class DOMExample {
    // Коллекция для хранения всех людей
    private static ArrayList<Human> humans = new ArrayList<>();

    // Константы для элементов
    private static final String PROFESSOR = "professor";
    private static final String MEMBER = "member";
    private static final String STUDENT = "student";

    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
        // Получение фабрики, чтобы после получить билдер documentов.
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

        // Получor из фабрики билдер, который парсит XML, создает структуру Document в виде иерархического дерева.
        DocumentBuilder builder = factory.newDocumentBuilder();

        // Запарсor XML, создав структуру Document. Теперь у нас есть доступ ко всем elementм, Howим нам нужно.
        Document document = builder.parse(new File("resource/xml_file3.xml"));

        // Получение информации про каждый элемент отдельно
        collectInformation(document, PROFESSOR);
        collectInformation(document, MEMBER);
        collectInformation(document, STUDENT);

        // Вывод информации
        humans.forEach(System.out::println);
    }

    /**
     * Метод ищет информацию про теги по имени element и вносит всю информацию в коллекцию humans.
     * @param document Документ, в котором будем искать элементы.
     * @param element Name element, теги которого нужно найти. Должна быть одна из констант, которые определяются выше.
     */
    private static void collectInformation(Document document, final String element) {
        // Получение всех элементов по имени тега.
        NodeList elements = document.getElementsByTagName(element);

        // Перебор всех найденных элементов
        for (int i = 0; i < elements.getLength(); i++) {
            // Получение всех атрибутов element
            NamedNodeMap attributes = elements.item(i).getAttributes();
            String name = attributes.getNamedItem("name").getNodeValue();

            // В зависимости от типа element, нам нужно собрать свою дополнительну информацию про каждый подкласс, а после добавить нужные образцы в коллекцию.
            switch (element) {
                case PROFESSOR: {
                    String experience = attributes.getNamedItem("experience").getNodeValue();
                    String discipline = attributes.getNamedItem("discipline").getNodeValue();

                    humans.add(new Professor(name, experience, discipline));
                } break;
                case STUDENT: {
                    String course = attributes.getNamedItem("course").getNodeValue();
                    String specialization = attributes.getNamedItem("specialization").getNodeValue();

                    humans.add(new Student(name, course, specialization));
                } break;
                case MEMBER: {
                    String position = attributes.getNamedItem("position").getNodeValue();

                    humans.add(new Member(name, position));
                } break;
            }
        }
    }
}
Note that we only need the element name to get all these elements from the document. This greatly simplifies the process of finding the information you need. All information about the code is included in the comments. Nothing new was used that was not present in previous tasks. Code output:
Профессор Herald, обладающий опытом: "7 years in University", выкладает дисциплину Math
Профессор Adam, обладающий опытом: "4 years in University", выкладает дисциплину Programming
Профессор Anton, обладающий опытом: "6 years in University", выкладает дисциплину English
Сотрудник обслуживающего персонала John, должность: janitor
Сотрудник обслуживающего персонала Jordan, должность: janitor
Сотрудник обслуживающего персонала Mike, должность: janitor
Голодный студент Maksim 3-го курса, обучающийся по специальности CE
Голодный студент Stephan 1-го курса, обучающийся по специальности CS
Голодный студент Irvin 2-го курса, обучающийся по специальности CE
Problem solved! Recommendations when to use DOM and when to use SAX The difference between these tools is in functionality and speed. If you need more flexible functionality and can afford to waste program performance, then your choice is DOM, but if your main goal is to reduce memory costs, then DOM is not the best choice, since it reads all the information from the XML file and stores it. Therefore, the SAX sequential reading method is less expensive. Briefly: if you need performance - SAX, functionality - DOM. <h2>Conclusion</h2>Each programmer has his own tools, and, depending on the task, you need to use certain tools. In the articles about SAX and DOM, my goal was to teach you how to extract information from XML files and process them the way you need it. However, even if you have read these articles, you cannot claim to have learned how to use these tools. You should practice, test the code from the articles, understand how it works, and try to write something yourself. After all, the most important thing is practice. The last article will be published in the coming days and, apparently, after the end of the competition, and will be devoted to JAXB. JAXB is a tool for saving objects in your program in XML format. That's all, I hope that this article was useful, and good luck in your programming :) Previous article: [Competition] XML Basics for a Java Programmer - Part 3.1 of 3 - SAX
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION