JavaRush /Java Blog /Random EN /XML Basics for the Java Programmer - Part 3.1 of 3 - SAX
Ярослав
Level 40
Днепр

XML Basics for the Java Programmer - Part 3.1 of 3 - SAX

Published in the Random EN group
Introduction Hello to all readers of my not-yet-last article, and I want to congratulate you: the complicated stuff about XML is behind us. This article will contain code in Java. There will be a little theory, and then practice. Due to the fact that one piece of material on SAX filled 10 pages in Word, I realized that I couldn’t fit into the limits. Therefore, article 3 will be divided into 3 separate articles, no matter how strange it may sound. Everything will be in this order: SAX -> DOM -> JAXB. This article will focus only on SAX. PS There was a task somewhere in the course where it was necessary to display all the internal elements in an HTML file. After this article, you will be able to do this without reading line by line with conventional BufferedReaderand complex processing algorithms, and also a similar solution will be given in the last practical example. Let's get started :) SAX (Simple API for XML) - THEORY The SAX handler is designed in such a way that it simply reads XML files sequentially and reacts to different events, after which it passes the information to a special event handler. It has quite a few events, but the most frequent and useful are the following:
  1. startDocument— the beginning of the document
  2. endDocument- end of document
  3. startElement- opening an element
  4. endElement- closing an element
  5. characters— text information inside elements.
All events are processed in an event handler , which must be created and methods overridden . Advantages: high performance due to the “direct” method of reading data, low memory costs. Disadvantages: limited functionality, which means that in non-linear problems we will have to refine it. SAX (Simple API for XML) – PRACTICE Immediately a list of imports so that you don’t search and confuse anything:
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
Now, first, we need to create a SAXParser:
public class SAXExample {
    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
        // Creation фабрики и образца parserа
        SAXParserFactory factory = SAXParserFactory.newInstance();
        SAXParser parser = factory.newSAXParser();
    }
}
As you can see, you first need to create a factory, and then create the parser itself in the factory. Now that we have the parser itself, we need a handler for its events. For this we need a separate class for our own convenience:
public class SAXExample {
    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
        SAXParserFactory factory = SAXParserFactory.newInstance();
        SAXParser parser = factory.newSAXParser();
    }

    private static class XMLHandler extends DefaultHandler {
        @Override
        public void startDocument() throws SAXException {
            // Тут будет логика реакции на начало documentа
        }

        @Override
        public void endDocument() throws SAXException {
            // Тут будет логика реакции на конец documentа
        }

        @Override
        public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
            // Тут будет логика реакции на начало element
        }

        @Override
        public void endElement(String uri, String localName, String qName) throws SAXException {
            // Тут будет логика реакции на конец element
        }

        @Override
        public void characters(char[] ch, int start, int length) throws SAXException {
            // Тут будет логика реакции на текст между elementми
        }

        @Override
        public void ignorableWhitespace(char[] ch, int start, int length) throws SAXException {
            // Тут будет логика реакции на пустое пространство внутри элементов (пробелы, переносы строчек и так далее).
        }
    }
}
We created a class with all the methods we needed to handle events that were listed in the theory. A little more additional theory: A little about characters: if the element contains text, for example, “ hello ”, then, theoretically, the method can be called 5 times in a row for each individual character, but this is not a big deal, since everything will still work. About the startElementand methods endElement:uri - this is the space in which the element is located, localName- this is the name of the element without a prefix, qName- this is the name of the element with a prefix (if there is one, otherwise just the name of the element). uriand localNamealways empty if we haven’t enabled space processing in the factory. This is done using the factory method setNamespaceAware(true). Then we can get space ( uri) and elements with prefixes in front of them ( localName). Task #1 - We have the following XML
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<company>
    <name>IT-Heaven</name>
    <offices>
        <office floor="1" room="1">
            <employees>
                <employee name="Maksim" job="Middle Software Developer" />
                <employee name="Ivan" job="Junior Software Developer" />
                <employee name="Franklin" job="Junior Software Developer" />
            </employees>
        </office>
        <office floor="1" room="2">
            <employees>
                <employee name="Herald" job="Middle Software Developer" />
                <employee name="Adam" job="Middle Software Developer" />
                <employee name="Leroy" job="Junior Software Developer" />
            </employees>
        </office>
    </offices>
</company>
Our goal: to get all the information about all employees from this file. First, we need to create a classEmployee:
public class Employee {
    private String name, job;

    public Employee(String name, String job) {
        this.name = name;
        this.job = job;
    }

    public String getName() {
        return name;
    }

    public String getJob() {
        return job;
    }
}
And in our main class SAXExamplewe need a list with all the employees:
private static ArrayList<Employee> employees = new ArrayList<>();
Now let's look carefully at where the information we need is in the XML file. And, as we can see, all the information we need is the attributes of the elements employee. And since startElementwe have such a useful parameter as attributes, then we have a fairly simple task. First, let's remove unnecessary methods so as not to clutter our code. We only need the startElement. And in the method itself, we must collect information from the attributes of the employee tag. Attention:
public class SAXExample {
    private static ArrayList<Employee> employees = new ArrayList<>();

    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
        SAXParserFactory factory = SAXParserFactory.newInstance();
        SAXParser parser = factory.newSAXParser();
    }

    private static class XMLHandler extends DefaultHandler {
        @Override
        public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
            if (qName.equals("employee")) {
                String name = attributes.getValue("name");
                String job = attributes.getValue("job");
                employees.add(new Employee(name, job));
            }
        }
    }
}
The logic is simple: if the name of an element is employee, we will simply receive information about its attributes. There attributesis a useful method where, knowing the name of an attribute, you can get its value. That's what we used. Now that we have created an event handler for the beginning of an element, we need to parse our XML file . To do this, just do this:
public class SAXExample {
    private static ArrayList<Employee> employees = new ArrayList<>();

    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
        SAXParserFactory factory = SAXParserFactory.newInstance();
        SAXParser parser = factory.newSAXParser();

        XMLHandler handler = new XMLHandler();
        parser.parse(new File("resource/xml_file1.xml"), handler);

        for (Employee employee : employees)
            System.out.println(String.format("Name сотрудника: %s, его должность: %s", employee.getName(), employee.getJob()));
    }

    private static class XMLHandler extends DefaultHandler {
        @Override
        public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
            if (qName.equals("employee")) {
                String name = attributes.getValue("name");
                String job = attributes.getValue("job");
                employees.add(new Employee(name, job));
            }
        }
    }
}
In the parse method you must pass the path to the xml file and the handler you created. And so, using this code we extracted information from this XML:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<company>
    <name>IT-Heaven</name>
    <offices>
        <office floor="1" room="1">
            <employees>
                <employee name="Maksim" job="Middle Software Developer" />
                <employee name="Ivan" job="Junior Software Developer" />
                <employee name="Franklin" job="Junior Software Developer" />
            </employees>
        </office>
        <office floor="1" room="2">
            <employees>
                <employee name="Herald" job="Middle Software Developer" />
                <employee name="Adam" job="Middle Software Developer" />
                <employee name="Leroy" job="Junior Software Developer" />
            </employees>
        </office>
    </offices>
</company>
And we got the following output:
Name сотрудника: Maksim, его должность: Middle Software Developer
Name сотрудника: Ivan, его должность: Junior Software Developer
Name сотрудника: Franklin, его должность: Junior Software Developer
Name сотрудника: Herald, его должность: Middle Software Developer
Name сотрудника: Adam, его должность: Middle Software Developer
Name сотрудника: Leroy, его должность: Junior Software Developer
Mission accomplished! Task #2 - we have the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<company>
    <name>IT-Heaven</name>
    <offices>
        <office floor="1" room="1">
            <employees>
                <employee>
                    <name>Maksim</name>
                    <job>Middle Software Developer</job>
                </employee>
                <employee>
                    <name>Ivan</name>
                    <job>Junior Software Developer</job>
                </employee>
                <employee>
                    <name>Franklin</name>
                    <job>Junior Software Developer</job>
                </employee>
            </employees>
        </office>
        <office floor="1" room="2">
            <employees>
                <employee>
                    <name>Herald</name>
                    <job>Middle Software Developer</job>
                </employee>
                <employee>
                    <name>Adam</name>
                    <job>Middle Software Developer</job>
                </employee>
                <employee>
                    <name>Leroy</name>
                    <job>Junior Software Developer</job>
                </employee>
            </employees>
        </office>
    </offices>
</company>
Our goal: to get all the information about all employees from this file. This problem will demonstrate well how a poorly structured XML file can make writing code more difficult. As you can see, information about name and position is now stored as text information inside the nameand elements job. To read text inside elements, we have the characters method. To do this, we need to create a new handler class with improved logic. Don’t forget that handlers are full-fledged classes capable of storing logic of any complexity. Therefore, now we will tune our processor. In fact, it’s enough to note that we always nametake jobturns, and it doesn’t matter in what order, we can easily save the name and profession into separate variables, and when both variables are saved, create our employee. Only here, along with the beginning of the element, we do not have a parameter for the text inside the element. We need to use methods on text. But how do we get text information inside an element if these are completely different methods? My solution: we just need to remember the name of the last element, and characterscheck in which element we are reading the information. You also need to remember that <codee>characters reads all characters inside elements, which means that all spaces and even line breaks will be read. And we don't need them. We need to ignore this data because it is incorrect.</codee> Code:
public class SAXExample {
    private static ArrayList<Employee> employees = new ArrayList<>();

    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
        SAXParserFactory factory = SAXParserFactory.newInstance();
        SAXParser parser = factory.newSAXParser();

        AdvancedXMLHandler handler = new AdvancedXMLHandler();
        parser.parse(new File("resource/xml_file2.xml"), handler);

        for (Employee employee : employees)
            System.out.println(String.format("Name сотрудника: %s, его должность: %s", employee.getName(), employee.getJob()));
    }

    private static class AdvancedXMLHandler extends DefaultHandler {
        private String name, job, lastElementName;

        @Override
        public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
            lastElementName = qName;
        }

        @Override
        public void characters(char[] ch, int start, int length) throws SAXException {
            String information = new String(ch, start, length);

            information = information.replace("\n", "").trim();

            if (!information.isEmpty()) {
                if (lastElementName.equals("name"))
                    name = information;
                if (lastElementName.equals("job"))
                    job = information;
            }
        }

        @Override
        public void endElement(String uri, String localName, String qName) throws SAXException {
            if ( (name != null && !name.isEmpty()) && (job != null && !job.isEmpty()) ) {
                employees.add(new Employee(name, job));
                name = null;
                job = null;
            }
        }
    }
}
As you can see, due to the banal complication of the XML file structure, our code has become significantly more complicated. However, the code is not complicated. Description: we created variables to store data about the employee ( name, job) , as well as a variable lastElementNameto record which element we are inside. After this, in the method characterswe filter the information, and if there is still information there, then this means that this is the text we need, and then we determine whether it is a name or a profession using lastElementName. In the method endElement, we check if all the information has been read, and if so, we create an employee and reset the information. The output of the solution is equivalent to the first example:
Name сотрудника: Maksim, его должность: Middle Software Developer
Name сотрудника: Ivan, его должность: Junior Software Developer
Name сотрудника: Franklin, его должность: Junior Software Developer
Name сотрудника: Herald, его должность: Middle Software Developer
Name сотрудника: Adam, его должность: Middle Software Developer
Name сотрудника: Leroy, его должность: Junior Software Developer
Thus, this problem has been solved , but you can notice that the complexity is higher. Therefore, we can conclude that storing text information in attributes will most often be more correct than in individual elements. And one more sweet task that will partially solve the problem in JavaRush about displaying information about an element in HTML, only it will need to be edited a little, here we will simply list all the elements inside an element :) Task No. 3 - given the element element, display the names and attributes of all internal elements; if the element is not found, display this. For this task we will use the following XML file:
<?xml version="1.0" encoding="UTF-8"?>
<root>
    <oracle>
        <connection value="jdbc:oracle:thin:@10.220.140.48:1521:test1" />
        <user value="secretOracleUsername" />
        <password value="111" />
    </oracle>

    <mysql>
        <connection value="jdbc:mysql:thin:@10.220.140.48:1521:test1" />
        <user value="secretMySQLUsername" />
        <password value="222" />
    </mysql>
</root>
As you can see, we have three possible scenarios here: root, mysql, oracle. Then the program will display all the information about all the elements inside. How can we do this? And it’s quite simple: we just need to declare a logical variable isEntered, which will indicate whether we need the element inside, and if inside, read all the data from startElement. Solution code:
public class SAXExample {
    private static boolean isFound;

    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
        SAXParserFactory factory = SAXParserFactory.newInstance();
        SAXParser parser = factory.newSAXParser();

        SearchingXMLHandler handler = new SearchingXMLHandler("root");
        parser.parse(new File("resource/xml_file3.xml"), handler);

        if (!isFound)
            System.out.println("Элемент не был найден.");
    }

    private static class SearchingXMLHandler extends DefaultHandler {
        private String element;
        private boolean isEntered;

        public SearchingXMLHandler(String element) {
            this.element = element;
        }

        @Override
        public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
            if (isEntered) {
                System.out.println(String.format("Найден элемент <%s>, его атрибуты:", qName));

                int length = attributes.getLength();
                for(int i = 0; i < length; i++)
                    System.out.println(String.format("Name атрибута: %s, его meaning: %s", attributes.getQName(i), attributes.getValue(i)));
            }

            if (qName.equals(element)) {
                isEntered = true;
                isFound = true;
            }
        }

        @Override
        public void endElement(String uri, String localName, String qName) throws SAXException {
            if (qName.equals(element))
                isEntered = false;
        }
    }
}
In this code, when entering an element about which we need information, we set the flag isEnteredto true, which means that we are inside the element. And as soon as we are inside the element, we simply process each new element startElement, knowing that it is exactly an internal element of our element. So we output the element name and its title. If the element was not found in the file, then we have a variable isFoundthat is set when the element is found, and if it is false, a message will be displayed that the element was not found. And as you can see, in the example SearchingXMLHandlerwe passed rootan element to the constructor. Conclusion for him:
Найден элемент <oracle>, его атрибуты:
Найден элемент <connection>, его атрибуты:
Name атрибута: value, его meaning: jdbc:oracle:thin:@10.220.140.48:1521:test1
Найден элемент <user>, его атрибуты:
Name атрибута: value, его meaning: secretOracleUsername
Найден элемент <password>, его атрибуты:
Name атрибута: value, его meaning: 111
Найден элемент <mysql>, его атрибуты:
Найден элемент <connection>, его атрибуты:
Name атрибута: value, его meaning: jdbc:mysql:thin:@10.220.140.48:1521:test1
Найден элемент <user>, его атрибуты:
Name атрибута: value, его meaning: secretMySQLUsername
Найден элемент <password>, его атрибуты:
Name атрибута: value, его meaning: 222
Thus, we received all the information about the internal elements and their attributes. The problem is solved. <h2>Epilogue</h2>You have seen that SAX is quite an interesting tool and quite effective, and it can be used in different ways, for different purposes, and so on, you just need to look at the problem from the right side, as shown in task No. 2 and No. 3, where SAX did not provide direct methods for solving the problem, but, thanks to our ingenuity, we were able to come up with a way out of the situation. The next part of the article will be entirely devoted to the DOM. I hope you enjoyed getting to know SAX. Experiment, practice and you will understand that everything is quite simple. And that's all, good luck with your programming and look forward to the part about the DOM soon. Good luck in your studies :) Previous article: [Competition] XML Basics for a Java Programmer - Part 2 of 3 Next article: [Competition] XML Basics for a Java Programmer - Part 3.2 of 3 - DOM
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION