JavaRush /Java Blog /Random EN /XML Basics for Java Programmer. Part 2 of 3
Ярослав
Level 40
Днепр

XML Basics for Java Programmer. Part 2 of 3

Published in the Random EN group

Introduction

Hello, dear readers of my article. This is the second article in the series about XML, and this article will talk about XML Namespace and XML Schema.
XML Basics
Just recently, I myself knew nothing about this, but I have mastered a lot of material and will try to explain these two important topics in simple words. I want to say right away that schemas are a very advanced mechanism for validating XML documents and are much more functional than DTDs, so there will not be a complete study of them here. Let's get started :)

XML Namespace

Namespace means “name space”, however in this article I will often replace the Russian expression with simply namespace, because it is shorter and easier to understand. XML Namespace is a technology whose main purpose is to make sure that all elements are unique in an XML file and there is no confusion. And since these are Java courses, the same technology is also available in Java packages. If we could put two classes with the same name next to each other and use them, how would we determine which class we needed? This problem is solved by packages - we can simply place classes in different packages and import them from there, specifying the exact name of the desired package and the path to it, or simply specifying the full path to the desired class. XML Basics for Java Programmer.  Part 2 of 3 - 1Now, we can do this:
public class ExampleInvocation {
    public static void main(String[] args) {
        // Creation экземпляра класса из первого пакета.
        example_package_1.Example example1 = new example_package_1.Example();

        // Creation экземпляра класса из второго пакета.
        example_package_2.Example example2 = new example_package_2.Example();

        // Creation экземпляра класса из третьего пакета.
        example_package_3.Example example3 = new example_package_3.Example();
    }
}
In XML Namespace everything is pretty much the same, just a little different. The essence is the same: if the elements are the same (like classes), then we just have to use them in different namespaces (specify packages), then even if the names of the elements (classes) begin to coincide, we will still access a specific element from the space ( package). For example: we have two elements in XML - prediction (oracle) and Oracle database.
<?xml version="1.0" encoding="UTF-8"?>
<root>
    <oracle>
        <connection value="jdbc:oracle:thin:@10.220.140.48:1521:test1" />
        <user value="root" />
        <password value="111" />
    </oracle>

    <oracle>
        Сегодня вы будете заняты весь день.
    </oracle>
</root>
And when we process this XML file, we will be seriously confused if instead of the database we receive a prediction, and vice versa too. In order to resolve the collision of elements, we can allocate each of them its own space to distinguish between them. There is a special attribute for this – xmlns:prefix= “unique value for namespace”. We can then prefix the elements to indicate that it is part of that namespace (essentially, we have to create a package path - namespace, and then prefix each element with which package it belongs to).
<?xml version="1.0" encoding="UTF-8"?>
<root>
    <database:oracle xmlns:database="Unique ID #1">
        <connection value="jdbc:oracle:thin:@10.220.140.48:1521:test1" />
        <user value="root" />
        <password value="111" />
    </database:oracle>

    <oracle:oracle xmlns:oracle="Unique ID #2">
        Сегодня вы будете заняты весь день.
    </oracle:oracle>
</root>
In this example, we have declared two namespaces: database and oracle. Now you can use namespace prefixes before elements. There is no need to be scared if something is unclear now. In fact, it's very simple. At first, I wanted to write this part of the article more quickly, but after Wednesday I decided that I needed to pay more attention to this topic, since it is easy to get confused or not understand something. Now a lot of attention will be paid to the xmlns attribute. And so, another example:
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns="https://www.standart-namespace.com/" xmlns:gun="https://www.gun-shop.com/" xmlns:fish="https://www.fish-shop.com/">
    <gun:shop>
        <gun:guns>
            <gun:gun name="Revolver" price="1250$" max_ammo="7" />
            <gun:gun name="M4A1" price="3250$" max_ammo="30" />
            <gun:gun name="9mm Pistol" price="450$" max_ammo="12" />
        </gun:guns>
    </gun:shop>

    <fish:shop>
        <fish:fishes>
            <fish:fish name="Shark" price="1000$" />
            <fish:fish name="Tuna" price="5$" />
            <fish:fish name="Capelin" price="1$" />
        </fish:fishes>
    </fish:shop>
</root>
You can see the regular XML using the spaces gun for gun store unique elements and fish for fishing store unique elements. You can see that by creating the spaces, we used one shop element for two different things at once - a weapons store and a fish store, and we know exactly what kind of store it is thanks to the fact that we declared the spaces. The most interesting thing will begin in the schemes, when we will be able to validate different structures with the same elements in this way. xmlns is an attribute for declaring a namespace; it can be specified in any element. An example of a namespace declaration:
xmlns:shop= «https://barber-shop.com/»
After the colon is a prefix - this is a space reference that can then be used before elements to indicate that they come from that space. The xmlns value must be a UNIQUE STRING. This is extremely important to understand: it is very common to use website links or URIs to declare a namespace. This rule is standard because the URI or URL of the link is unique, BUT this is where it gets very confusing. Just remember: the value can be ANY string you want, but to be sure it's unique and standard, you need to use a URL or URI. The fact that you can use any strings is shown in the example in oracle:
xmlns:oracle="Unique ID #2"
xmlns:database="Unique ID #1"
When you declare a namespace, you can use it on the element itself and on all elements within it, so namespaces declared on the root element can be used on all elements. This can be seen in the last example, and here is a more specific example:
<?xml version="1.0" encoding="UTF-8"?>
<root>
    <el1:element1 xmlns:el1="Element#1 Unique String">
        <el1:innerElement>

        </el1:innerElement>
    </el1:element1>


    <el2:element2 xmlns:el2="Element#2 Unique String">
        <el2:innerElement>

        </el2:innerElement>
    </el2:element2>


    <el3:element3 xmlns:el3="Element#3 Unique String">
        <el3:innerElement>
            <el1:innerInnerElement> <!-- Так нельзя, потому что пространство el1 объявлено только в первом элементе, потому может использовать только внутри первого element и его внутренних элементов. -->

            </el1:innerInnerElement>
        </el3:innerElement>
    </el3:element3>
</root>
Here is an important detail: there is also a standard namespace in the root element. If you declare other namespaces, you override the default one and cannot use it. Then you need to put some kind of space prefix in front of the root element, any that you declared earlier. However, this can also be tricked: you can declare the standard space explicitly. It’s enough just not to use a prefix after xmlns, but to immediately write down some value, and all your elements without a prefix will belong to this particular namespace. The last example used this:
<root xmlns="https://www.standart-namespace.com/" xmlns:gun="https://www.gun-shop.com/" xmlns:fish="https://www.fish-shop.com/">
We declared the standard space explicitly to avoid the need to use gun or fish, since the root element is not the entity of either a fishing shop or a weapon, so using either space would be logically incorrect. Next: if you created xmlns:a and xmlns:b, but they have the same value, then this is the same space and they are not unique. That’s why you should always use unique values, because violating this rule can create a large number of errors. For example, if we had spaces declared like this:
xmlns="https://www.standart-namespace.com/" xmlns:gun="https://www.gun-shop.com/" xmlns:fish="https://www.gun-shop.com/"
Then our fishing store would become a weapons store, and the prefix would still be a fish store. These are all the main points of the spaces. I spent quite a lot of time collecting them all and reducing them, and then expressing them clearly, since the information on spaces on the Internet is very huge and often just water, so most of everything that is here - I learned it myself through trial and error . If you still have questions, you can try to read the materials using the links at the end of the article.

XML Schema

I want to say right away that this article will only be the tip of the iceberg, since the topic is very broad. If you want to get acquainted with schemes in more detail and learn how to write them yourself of any complexity, then at the end of the article there will be a link where everything will be about different types, restrictions, extensions, and so on. I want to start with theory. Schemes have the .xsd (xml scheme definition) format and are a more advanced and popular alternative to DTDs: they can also create elements, describe them, and so on. However, a lot of bonuses have been added: type checking, namespace support and wider functionality. Remember when we talked about DTD, there was a minus that it does not support spaces? Now that we have studied this, I will explain: if it were possible to import two or more schemas from a DTD, where there were identical elements, we would have collisions (coincidences) and would not be able to use them at all, because it is not clear which element we need . XSD solves this problem because you can import schemas into one specific space and use it. Essentially, every XSD schema has a target space, which means in which space the schema should be written in the XML file. Thus, in the XML file itself, we just need to create these spaces predefined in the schemas and assign prefixes to them, and then connect the necessary schemas to each of them, after which we can safely use elements from the schema, substituting prefixes from the space where we imported the schematics. And so, we have an example:
<?xml version="1.0" encoding="UTF-8"?>
<house>
    <address>ул. Есенина, дом №5</address>
    <owner name="Ivan">
        <telephone>+38-094-521-77-35</telephone>
    </owner>
</house>
We want to validate it with a schema. First, we need a schema:
<?xml version="1.0"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="https://www.nedvigimost.com/">
    <element name="house">
        <complexType>
            <sequence>
                <element name="address" type="string" maxOccurs="unbounded" minOccurs="0" />
                <element name="owner" maxOccurs="unbounded" minOccurs="0" >
                    <complexType>
                        <sequence>
                            <element name="telephone" type="string" />
                        </sequence>
                        <attribute name="name" type="string" use="required"/>
                    </complexType>
                </element>
            </sequence>
        </complexType>
    </element>
</schema>
As you can see, schemas are also XML files. You write what you need directly in XML. This schema is capable of validating the XML file from the example above. For example: if the owner does not have a name, the circuit will see this. Also, thanks to the sequence element, the address should always come first, and then the owner of the house. There are ordinary and complex elements. Regular elements are elements that store only some type of data. Example:
<element name="telephone" type="string" />
This is how we declare an element that stores a string. There should be no other elements inside this element. There are also complex elements. Complex elements are capable of storing other elements and attributes within themselves. Then you don’t need to specify the type, but just start writing a complex type inside the element.
<complexType>
    <sequence>
        <element name="address" type="string" maxOccurs="unbounded" minOccurs="0" />
        <element name="owner" maxOccurs="unbounded" minOccurs="0" >
            <complexType>
                <sequence>
                    <element name="telephone" type="string" />
                </sequence>
                <attribute name="name" type="string" use="required"/>
            </complexType>
        </element>
    </sequence>
</complexType>
It was also possible to do it differently: you could create a complex type separately, and then substitute it into type. Only while writing this example, for some reason it was necessary to declare the space under some kind of prefix, and not use the standard one. In general, it turned out like this:
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="https://www.nedvigimost.com/">
    <xs:element name="house" type="content" />

    <xs:complexType name="content">
        <xs:sequence>
            <xs:element name="address" type="xs:string" maxOccurs="unbounded" minOccurs="0" />
            <xs:element name="owner" maxOccurs="unbounded" minOccurs="0" >
                <xs:complexType>
                    <xs:sequence>
                        <xs:element name="telephone" type="xs:string" />
                    </xs:sequence>
                    <xs:attribute name="name" type="xs:string" use="required"/>
                </xs:complexType>
            </xs:element>
        </xs:sequence>
    </xs:complexType>
</xs:schema>
This way, we can create our own types separately and then substitute them somewhere in the type attribute. This is very convenient as it allows you to use one type in different places. I would like to talk more about connecting circuits and finish here. There are two ways to connect a circuit: into a specific space and just connect.

The first way to connect the circuit

The first method assumes that the circuit has a specific target space. It is specified using the targetNamespace attribute on the scheme element. Then it’s enough to create THIS SAME space in the XML file, and then “load” the schema there:
<?xml version="1.0" encoding="UTF-8"?>
<nedvig:house xmlns:nedvig="https://www.nedvigimost.com/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://www.nedvigimost.com/ example_schema1.xsd">
    <address>ул. Есенина, дом №5</address>
    <owner name="Ivan">
        <telephone>+38-094-521-77-35</telephone>
    </owner>
</nedvig:house>
It is important to understand two lines:
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemeLocation="https://www.nedvigimost.com/ example_schema1.xsd"
The first line - just remember it. Think of it as an object that helps load schematics where they need to go. The second line is a specific download. schemaLocation accepts a list of values ​​of the form "value - value", separated by space. The first argument is the namespace, which must match the target namespace in the schema (the targetNamespace value). The second argument is the relative or absolute path to the schema. And since this is a LIST value, you can put a space after the scheme in the example, and again enter the target space and the name of another scheme, and so on as much as you want. Important:In order for the schema to validate something later, you need to declare this space and use it with a prefix. Look carefully at the last example:
<?xml version="1.0" encoding="UTF-8"?>
<nedvig:house xmlns:nedvig="https://www.nedvigimost.com/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://www.nedvigimost.com/ example_schema1.xsd">
    <address>ул. Есенина, дом №5</address>
    <owner name="Ivan">
        <telephone>+38-094-521-77-35</telephone>
    </owner>
</nedvig:house>
We created this target space on the nedvig prefix and then used it. Thus, our elements began to be validated, since we began to use the space where the target schema space is referenced.

The second way to connect the circuit

The second way to connect a circuit implies that the circuit does not have a specific target space. Then you can simply connect it to the XML file and it will validate it. This is done in almost the same way, only you can not declare spaces at all in the XML file, but simply connect the schema.
<?xml version="1.0" encoding="UTF-8"?>
<house xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="example_schema1.xsd">
    <address>ул. Есенина, дом №5</address>
    <owner name="Ivan">
        <telephone>+38-094-521-77-35</telephone>
    </owner>
</house>
As you can see, this is done using noNamespaceSchemaLocation and specifying the path to the schema. Even if the schema does not have a target space, the document will be validated. And the final touch: we can import other diagrams into diagrams, and then use elements from one diagram in another. Thus, we can use elements in some circuits that are already in others. Example:

Schema where the owner type is declared:

<?xml version="1.0" encoding="UTF-8" ?>
<schema targetNamespace="bonus" xmlns="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
        <complexType name="owner">
            <all>
                <element name="telephone" type="string" />
            </all>
            <attribute name="name" type="string" />
        </complexType>
</schema>

The second schema, which uses the owner type from the first schema:

<?xml version="1.0" encoding="UTF-8"?>
<schema targetNamespace="main" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:bonus="bonus" elementFormDefault="qualified">
    <import namespace="bonus" schemaLocation="xsd2.xsd" />
    <element name="house">
        <complexType>
            <all>
              <element name="address" type="string" />
                <element name="owner" type="bonus:owner" />
            </all>
        </complexType>
    </element>
</schema>
The second scheme uses the following construction:
<import namespace="bonus" schemaLocation="xsd2.xsd" />
Using it, we imported types and elements from one schema to another into the bonus space. Thus, we have access to the bonus:owner type. And in the next line we used it:
<element name="owner" type="bonus:owner" />
Also a little attention to the following line:
elementFormDefault="qualified"
This attribute is declared in schema and means that in XML files, each element must be declared with an explicit prefix before it. If it is not there, then we just need to declare an external element with a prefix, and we also need to set prefixes in all elements inside, clearly indicating that we are using exactly the elements of this scheme. And here, in fact, is an example of an XML file validated by a schema that imported another schema:
<?xml version="1.0" encoding="UTF-8"?>
<nedvig:house xmlns:nedvig="main" xmlns:bonus="bonus" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="main xsd.xsd">
    <nedvig:address>ул. Есенина, дом №5</nedvig:address>
    <nedvig:owner name="Ivan">
        <bonus:telephone>+38-094-521-77-35</bonus:telephone>
    </nedvig:owner>
</nedvig:house>
In the line:
<bonus:telephone>+38-094-521-77-35</bonus:telephone>
We need to explicitly declare the bonus namespace, which points to the target space of the first schema, since elementFormDefault is qualified (check), so all elements must explicitly indicate their space.

End of article

The next article will be the last in the series and will already be about processing XML files using Java. We will learn to obtain information in different ways and so on. I hope that this article was useful and, even if there are errors somewhere, it will teach you something useful and new, or maybe just give you the opportunity to better understand XML files. For those who would like to explore this in more detail, I decided to put together a small set of links:
  • XSD Simple Elements - starting from this article, start reading and move forward, all the information on the schemes is collected there and is explained more or less clearly, only in English. You can use a translator.

  • video on namespaces, it's always useful to listen to another point of view on something if the first one is not clear.

  • Namespace XML is a good example of the use of namespaces and is quite comprehensive.

  • XML Basics - Namespaces - Another short article on namespaces.

  • The Basics of Using XML Schema to Define Elements is also an extremely useful reference on schemas, but you need to read it slowly and carefully, delving into the material.

That's all for sure, I hope that if you want to learn something deeper from this, the links will help you. I went through all of these sources myself, studying all the material, and, overall, these were the most useful of all the sources that I looked at, since each of them either improved the understanding of what I had already read somewhere else, or let me learn something new, but a lot was done just during practice. So, for those who really want to understand all this quite well, my advice is: study namespaces, then how to easily connect schemas to XML files, and then how to write the document structure in schemas. And most importantly, practice. Thank you all for your attention and good luck in programming :) Previous article: [Competition] XML Basics for a Java Programmer - Part 1 of 3 Next article: [Competition] XML Basics for a Java Programmer - Part 3.1 of 3 - SAX
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION