JavaRush /Java Blog /Random EN /XML in Java: What is it?

Level 41

8 August 2023
253 views
0 comments

XML in Java: What is it?

Hello! Today we will get acquainted with another data format called XML. This is a very important topic. When working on real Java applications, you will almost certainly encounter XML-related problems. This format is used almost everywhere in Java development (we’ll find out why exactly below), so I recommend that you read the lecture not “diagonally”, but understand everything thoroughly and at the same time study additional literature/links :) This time will definitely not be wasted. So, let's start with the simple ones - “what” and “why”!

What is XML?

XML stands for eXtensible Markup Language. You may already be familiar with one of the markup languages: you've heard of HTML, which is used to create web pages :) What is XML - 1

HTML and XML are even similar in appearance:

HTML 1

<h1>title</h1>
<p>paragraph</p>
<p>paragraph</p>

XML 1

<headline>title</headline>
<paragraph>paragraph</paragraph>
<paragraph>paragraph</paragraph>

HTML 2

<h1>title</h1>
<p>paragraph</p>
<p>paragraph</p>

XML 2

<chief>title</chief>
<paragraph>paragraph</paragraph>
<paragraph>paragraph</paragraph>

In other words, XML is a language for describing data.

Why is XML needed?

XML was originally invented for more convenient storage and transmission of data, including over the Internet. It has a number of advantages that allow it to successfully cope with this task. First, it is easy to read by both humans and computers. I think you can easily understand what this xml file describes:

<?xml version="1.0" encoding="UTF-8"?>
<book>
   <title>Harry Potter and the Philosopher’s Stone</title>
   <author>J. K. Rowling</author>
   <year>1997</year>
</book>

The computer also easily understands this format. Secondly, since the data is stored in a simple text format, there will be no compatibility issues when transferring it from one computer to another. It is important to understand that XML is not executable code, but a data description language . After you have described the data using XML, you need to write code (for example, in Java) that can send/receive/process this data.

How does XML work?

Its main component is tags: these are the things in angle brackets:

<book>
</book>

There are opening and closing tags. The closing one has an additional symbol - “ /”, this can be seen in the example above. Each opening tag must have a corresponding closing tag. They show where the description of each element in the file begins and ends. Tags can be nested! In our book example, the <book> tag has 3 subtags - <title> , <author> and <year> . This is not limited to one level: subtags can have their own subtags, etc. This design is called a tag tree. Let's look at the tree using the example of an XML file with a description of a car dealership:

<?xml version="1.0" encoding="UTF-8"?>
<carstore>
   <car category="truck">
       <model lang="en">Scania R 770</model>
       <year>2005</year>
       <price currency="US dollar">200000.00</price>
   </car>
   <car category="sedan">
       <title lang="en">Ford Focus</title>
       <year>2012</year>
       <price currency="US dollar">20000.00</price>
   </car>
   <car category="sport">
       <title lang="en">Ferrari 360 Spider</title>
       <year>2018</year>
       <price currency="US dollar">150000.00</price>
   </car>
</carstore>

Here we have a top level tag - <carstore> . It is also called “root” - root tag. <carstore> has one child tag, <car>. <car>, in turn, also has 3 of its own child tags - <model>, <year> and <price>. Each tag can have attributes - additional important information. In our example, the <model> tag has an attribute “lang” - the language in which the name of the model is written:

<model lang="en">Scania R 770</model>

This way we can indicate that the title is written in English. Our <price> tag has a “currency” attribute.

<price currency="US dollar">150000.00</price>

This way we can indicate that the price for the car is in US dollars. Thus, XML has a "self-describing" syntax . You can add any information you need to describe the data. You can also add a line at the beginning of the file indicating the XML version and the encoding in which the data is written. It's called " prolog " and looks like this:

<?xml version="1.0" encoding="UTF-8"?>

We use XML version 1.0 and UTF-8 encoding. This is not necessary, but it can be useful if, for example, you use text in different languages in your file. We mentioned that XML stands for “extensible markup language,” but what does “extensible” mean? This means that it is perfectly suited for creating new versions of your objects and files. For example, we want our car showroom to start selling motorcycles too! At the same time, in the program we need to support both versions of <carstore> - both the old one (without motorcycles) and the new one. Here is our old version:

<?xml version="1.0" encoding="UTF-8"?>
<carstore>
   <car category="truck">
       <model lang="en">Scania R 770</model>
       <year>2005</year>
       <price currency="US dollar">200000.00</price>
   </car>
   <car category="sedan">
       <title lang="en">Ford Focus</title>
       <year>2012</year>
       <price currency="US dollar">20000.00</price>
   </car>
   <car category="sport">
       <title lang="en">Ferrari 360 Spider</title>
       <year>2018</year>
       <price currency="US dollar">150000.00</price>
   </car>
</carstore>

And here is the new, expanded one:

<?xml version="1.0" encoding="UTF-8"?>
<carstore>
   <car category="truck">
       <model lang="en">Scania R 770</model>
       <year>2005</year>
       <price currency="US dollar">200000.00</price>
   </car>
   <car category="sedan">
       <title lang="en">Ford Focus</title>
       <year>2012</year>
       <price currency="US dollar">20000.00</price>
   </car>
   <car category="sport">
       <title lang="en">Ferrari 360 Spider</title>
       <year>2018</year>
       <price currency="US dollar">150000.00</price>
   </car>
   <motorcycle>
       <title lang="en">Yamaha YZF-R6</title>
       <year>2018</year>
       <price currency="Russian Ruble">1000000.00</price>
       <owner>Vasia</owner>
   </motorcycle>
   <motorcycle>
       <title lang="en">Harley Davidson Sportster 1200</title>
       <year>2011</year>
       <price currency="Euro">15000.00</price>
       <owner>Petia</owner>
   </motorcycle>
</carstore>

So easily and simply we added the description of motorcycles to our file :) At the same time, we absolutely do not need to set the same child tags for motorcycles as for cars. Please note that motorcycles, unlike cars, have an element <owner> - the owner. This will not prevent a computer (or a person either) from reading the data.

Differences between XML and HTML

We have already said that XML and HTML are very similar in appearance. Therefore, it is very important to know how they differ. Firstly, they are used for different purposes. HTML - for marking up web pages. For example, if you need to create a website, using HTML you can specify: “The menu should be in the top right corner. It should have such and such buttons.” In other words, the purpose of HTML is to display data. XML - for storing and transmitting information in a form convenient for humans and computers. This format does not contain any instructions on how this data should be displayed: it depends on the code of the program itself. Secondly, they have a main technical difference. HTML tags are predefined. In other words, to create a heading (for example, a large inscription at the beginning of the page), only <h1></h1> tags are used in HTML (for smaller headings - <h2></h2>, <h3></h3>). You won't be able to create headings in HTML using tags with different titles. XML does not use predefined tags. You can give the tags any names you want - <header>, <title>, <idontknow2121>.

Conflict resolution

The freedom that XML provides can also lead to some problems. For example, the same entity (for example, a machine) can be used by a program for different purposes. For example, we have an XML file that describes machines. However, our programmers did not agree among themselves in advance. And now, in addition to data from real cars, our xml also includes data from toy models! Moreover, they have the same attributes. Our program receives the following XML file. How can we tell a real car from a toy model?

<?xml version="1.0" encoding="UTF-8"?>
<carstore>
   <car category="truck">
       <model lang="en">Scania R 770</model>
       <year>2005</year>
       <price currency="US dollar">200000.00</price>
   </car>
   <car category="sedan">
       <title lang="en">Ford Focus</title>
       <year>2012</year>
       <price currency="US dollar">100.00</price>
   </car>
</carstore>

Prefixes and namespaces will help us here. To separate toy cars from real ones in our program (and, in general, any toy things from their real prototypes), we introduce two prefixes - “real” and “toy”.

<real:car category="truck">
   <model lang="en">Scania R 770</model>
   <year>2005</year>
   <price currency="US dollar">200000.00</price>
</real:car>
<toy:car category="sedan">
   <title lang="en">Ford Focus</title>
   <year>2012</year>
   <price currency="US dollar">100.00</price>
</toy:car>

Now our program will be able to distinguish between entities! Anything with the toy prefix will be classified as toys :) However, we are not finished yet. To use prefixes, we need to register each of them as a namespace. Well, actually, “register” is a strong word :) You just need to come up with a unique name for each of them. It's like with classes: a class has a short name ( Cat) and a full name with all the packages ( zoo.animals.Cat) To create unique namespaces, a URI is usually used . Sometimes the Internet address is substituted here, where the functions and purpose of this namespace are described in detail. But this does not have to be a valid Internet address. Very often, projects simply use URI-like strings that help track the hierarchy of namespaces. Here's an example:

<?xml version="1.0" encoding="UTF-8"?>
<carstore xmlns:real="http://testproject.developersgroup1.companyname/department2/namespaces/real"
         xmlns:toy="http://testproject.developersgroup1.companyname/department2/namespaces/toy">
<real:car category="truck">
   <model lang="en">Scania R 770</model>
   <year>2005</year>
   <price currency="US dollar">200000.00</price>
</real:car>
<toy:car category="sedan">
   <title lang="en">Ford Focus</title>
   <year>2012</year>
   <price currency="US dollar">100.00</price>
</toy:car>
</carstore>

Of course, there is no site on the Internet at the address http://testproject.developersgroup1.companyname/department2/namespaces/real But there is useful information: the developer group “developersgroup1” from the “department2” department is responsible for creating the “real” namespace. If you need to add new names, or discuss possible conflicts with them, we know where to turn. Sometimes a real Internet address with a description of this namespace is used as a unique name for a namespace. For example, if it is a large company and its project will be used by millions of people around the world. But this is not always done: there is a discussion of this issue on Stackoverflow . In principle, the requirement to use URIs as names for namespaces is not strict: you can just use random strings. This option will also work:

xmlns:real="nvjneasiognipni4435t9i4gpojrmeg"

But there are a number of advantages to using URIs. You can read more about this here .

Core XML Standards

XML standards are a set of extensions that add additional functionality to xml files. XML has a lot of standards, but we'll just look at the most important ones and find out what they allow AJAX , one of the most famous XML standards, to do. It allows you to change the content of a web page without reloading it! Sounds cool? :) You can try this technology in person here . XSLT - allows you to convert XML text to other formats. For example, using XSLT, you can transform XML into HTML! The purpose of XML, as we have already said, is to describe data, not to display it. But using XSLT we can bypass this limitation! Here is a sandbox with a working example, where you can see for yourself how it works :) XML DOM - allows you to get, change, add or remove individual elements from an XML file. Here's a small example of how it works. We have a books.xml file:

<bookstore>
   <book category="cooking">
       <title lang="en">Everyday Italian</title>
       <author>Giada De Laurentiis</author>
       <year>2005</year>
       <price>30.00</price>
   </book>
   <book category="children">
       <title lang="en">Harry Potter</title>
       <author>J K. Rowling</author>
       <year>2005</year>
       <price>29.99</price>
   </book>
</bookstore>

There are two books in it. Books have such an element as a title - <title>. And here we can use JavaScript to get all the book titles from our XML file and output the first of them to the console:

<!DOCTYPE html>
<html>
<body>

<p id="demo"></p>

<script>
var xhttp = new XMLHttpRequest();
xhttp.onreadystatechange = function() {
    if (this.readyState == 4 && this.status == 200) {
  myFunction(this);
  }
};
xhttp.open("GET", "books.xml", true);
xhttp.send();

function myFunction(xml) {
    var xmlDoc = xml.responseXML;
  document.getElementById("demo").innerHTML =
  xmlDoc.getElementsByTagName("title")[0].childNodes[0].nodeValue;
}
</script>

</body>
</html>

Again, I recommend seeing how this example works using a sandbox :) DTD (“document type definition”) - allows you to define a list of allowed elements for some entity in an XML file. For example, we are working on a bookstore website, and all the development teams have agreed that for the book element, only the title, author, and year attributes should be specified in the XML files. But how can we protect ourselves from inattention? Very easy!

<?xml version="1.0"?>
<!DOCTYPE book [
       <!ELEMENT book (title,author,year)>
       <!ELEMENT title (#PCDATA)>
       <!ELEMENT author (#PCDATA)>
       <!ELEMENT year (#PCDATA)>
       ]>

<book>
   <title>The Lord of The Rings</title>
   <author>John R.R. Tolkien</author>
   <year>1954</year>
</book>

Here we have defined a list of valid attributes for <book>. Try to add a new element there and you will immediately get an error!

<book>
   <title>The Lord of The Rings</title>
   <author>John R.R. Tolkien</author>
   <year>1954</year>
   <mainhero>Frodo Baggins</mainhero>
</book>

Error! “Element mainhero is not allowed here” There are many other XML standards. You can get acquainted with each of them and try to dig deeper into the code on the WC3 website (section “Important XML Standards”). And in general, if you need information on XML, you can find almost everything there :) Well, our lecture has come to an end. It's time to get back to tasks! :) See you!

Comments

TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION