Set up a machine learning algorithm and develop your first predictive function using Java. Self-driving cars, facial recognition systems and voice assistants are all developed using machine learning technologies and frameworks. And this is only the first wave. Over the next 10 years, a new generation of products will transform our world, giving rise to new approaches to the development of programs, products and applications.
As a Java programmer, you want to catch this wave now that tech companies are starting to invest heavily in machine learning. What you learn today you can use for the next five years. But where to start? This article aims to answer this question. You'll get a first impression of the principles of machine learning by following our short guide to implementing and preparing a machine learning algorithm. After learning about the structure of a learning algorithm and the features you can use to train it, evaluate it, and select the function that provides the best prediction accuracy, you'll gain an understanding of how to use the JVM framework (Weka) to build machine learning solutions. This article focuses on supervised machine learning because it is the principle most commonly used in developing smart applications.
Machine learning and artificial intelligence
Machine learning evolved from the field of artificial intelligence, which aims to create machines that can imitate human intelligence. Although the term "machine learning" originated in computer science, artificial intelligence is not a new field of science.
The Turing test , developed by mathematician Alan Turing in the early fifties of the 20th century, is one of the first tests designed to determine whether a computing machine has true intelligence. According to the Turing test, a computer proves the presence of human intelligence by impersonating a person without the latter realizing that he is talking to the machine.
Many popular machine learning approaches today are based on ideas that are decades old. But the last decade in computing (and distributed computing platforms) has brought sufficient power to apply machine learning algorithms. Most of them require a huge amount of matrix multiplication and other mathematical calculations. Twenty years ago, computing technologies that would allow such calculations simply did not exist, but now they have become a reality. Machine learning algorithms allow programs to carry out the quality improvement process and expand their capabilities without human intervention. A program developed using machine learning is able to independently update or extend its own code.
Supervised learning vs unsupervised learning
Supervised and unsupervised learning are the two most popular approaches to machine learning. Both options require feeding the machine huge amounts of data records to build relationships and learn from. Such collected data is usually called
a "feature vector" . For example, we have a certain residential building. In this case, the feature vector may contain features such as: the total area of the house, the number of rooms, the year the house was built, and so on.
In supervised learning , a machine learning algorithm is trained to answer questions related to feature vectors. To train the algorithm, it is fed a set of feature vectors and associated labels. The associated label is provided by a person (the teacher) and it contains the correct "answer" to the question asked. The learning algorithm analyzes feature vectors and correct labels to find internal structure and relationships between them. This way the machine learns to answer questions correctly. As an example, we can consider a certain intelligent application for real estate trading. It can be trained using a feature vector including size, number of rooms, and year built for a set of houses. A person must assign each house a label with the correct price of the house based on these factors. By analyzing this data, a smart application should train itself to answer the question, “How much money can I get for this house?”
Once the preparation process is completed, new input data is no longer marked. The machine must be able to answer questions correctly, even for unknown, unlabeled feature vectors. In unsupervised learning, the algorithm is designed to predict answers without human labeling (or even without asking questions). Instead of determining a label or outcome, unsupervised learning algorithms use large data sets and computing power to discover previously unknown relationships. For example, in consumer product marketing, unsupervised learning can be used to identify hidden relationships or groupings of customers, which can ultimately help improve the marketing program or create a new one. In this article, we will focus on supervised machine learning; this is currently the most commonly used approach.
Supervised Machine Learning
All machine learning is based on data. For a supervised machine learning project, you need to mark the data with markers in a way that provides meaningful answers to the question being asked. Below, in Table-1, each house information record is labeled “house price”. By identifying the relationship between record data and the price of a home, the algorithm should eventually be able to predict the market price for homes not included in the given list. (Please note that the area of the house is indicated in square meters, and the price of the house is in euros).
Table 1. List of houses
|
Sign |
Sign |
Sign |
Label |
House area |
Number of rooms |
Age of the house |
Expected house price |
90 m2 / 295 ft |
2 rooms |
23 years old |
€249,000 |
101 m2 / 331 ft |
3 rooms |
n/a |
€338,000 |
1330 m2 / 4363 ft |
11 rooms |
12 years |
6,500,000 € |
In the early stages, you'll likely label the data manually, but eventually you'll teach your program to do it on its own. You've probably already seen this approach work with email clients, where in order to move an email to the Spam folder, you answer the question "Is this email spam?" When you reply, you train the program to recognize emails you don't want to see. The application's spam filter is trained to mark messages from the same source or containing the same content and manage them according to the appropriate rules. Labeled datasets are required for preparation and testing purposes only. Once this step is completed, the machine learning algorithm works on the unlabeled data. For example, you can feed a prediction algorithm a new, unlabeled record of data about a house, and it should automatically predict the expected price of the house based on the “knowledge” gained from the preparatory data.
How a machine learns to predict
The challenge with supervised machine learning is finding the appropriate prediction function for a given question. Mathematically, the difficulty is to find a function that takes a variable as input
х
and returns the predicted value
у
. This function of hypotheses
(hθ)
is the result of a preparation process. Often the hypothesis function is also called the objective function or prediction function.
y = h θ (x)
In most cases,
х
it is a data array. In our example, this is a two-dimensional array of elements that define a house, consisting of the number of rooms and the area of the house. An array of such values is a feature vector. By specifying a specific objective function, we can use it to predict each feature vector
х
. To predict the price of a house, you must call the objective function using a feature vector
{101.0, 3.0}
consisting of the area of the house and the number of rooms:
Function<Double[], Double> h = ...;
Double[] x = new Double[] { 101.0, 3.0 };
double y = h.apply(x);
In the source code from Example-1, the values in the array
х
represent a vector of house features. The value
у
returned by the objective function is the predicted price of the house. The goal of machine learning is to determine the objective function that will work as accurately as possible given unknown input parameters. In machine learning, the objective function
(hθ)
is sometimes called a model. This model is the result of a learning process.
Based on labeled training samples, the learning algorithm looks for structures or patterns in the training data. So he builds a model that is generally good for the data. As a rule, the learning process is exploratory in nature. In most cases, the process is repeated many times using different variants of learning algorithms and configurations. As a result, all models are evaluated based on performance metrics, among which the best one is selected. And this model is used to calculate estimated values for future untagged data.
Linear regression
To teach a machine to “think,” you first need to choose the learning algorithm you will use. For example, linear regression. This is one of the simplest and most popular supervised machine learning algorithms. The algorithm assumes that the relationship between input features and result markers is linear. The general linear regression function below returns the predicted value by summing all the elements of the feature vector multiplied by the parameter
θ
(theta) . This parameter is used during the training process to adapt or “tune” the regression function based on the training data.
h θ (x) = θ 0 * 1 + θ 1 * x 1 + ... θ n * x n
In a linear regression function, the theta parameter and feature parameters are numbered with subscripts. The subscript determines the position of the parameter
(θ)
and feature parameter
(х)
in the vector. Note that the x
0 feature is a constant shift term and is significant
1
for computational purposes. As a result, the index of significant parameters such as the area of the house begins with x
1 . So, if x
1 is assigned the first value of the feature vector (house area), then x
2 will take the next value (number of rooms) and so on. Example-2 demonstrates the Java implementation of the linear regression function, mathematically denoted as h
θ (x). For simplicity, calculations are performed using the
double
. In the method
apply()
, it is provided that the first element of the array will be equal to 1.0 and will be set outside of this function.
Example 2: Linear Regression in Java
public class LinearRegressionFunction implements Function<Double[], Double> {
private final double[] thetaVector;
LinearRegressionFunction(double[] thetaVector) {
this.thetaVector = Arrays.copyOf(thetaVector, thetaVector.length);
}
public Double apply(Double[] featureVector) {
assert featureVector[0] == 1.0;
double prediction = 0;
for (int j = 0; j < thetaVector.length; j++) {
prediction += thetaVector[j] * featureVector[j];
}
return prediction;
}
public double[] getThetas() {
return Arrays.copyOf(thetaVector, thetaVector.length);
}
}
To create a new instance
LinearRegressionFunction
, you need to specify the parameter
θ
. This parameter or vector is used to adapt the general linear regression function to the underlying training data. The parameter
θ
used in the program will be adjusted during the training process, based on training examples. The quality of the trained target function will depend on the quality of the data prepared for training. In the example below we use
LinearRegressionFunction
price predictions based on the size of the house to illustrate. Considering that x
0 must be a constant with a value of 1.0, the objective function is initialized using two parameters
θ
, where they are the result of the learning process. After creating a new example, the price of a house with an area of 1330 square meters will be predicted as shown below:
double[] thetaVector = new double[] { 1.004579, 5.286822 };
LinearRegressionFunction targetFunction = new LinearRegressionFunction(thetaVector);
Double[] featureVector = new Double[] { 1.0, 1330.0 };
double predictedPrice = targetFunction.apply(featureVector);
In the figure below you can see the graph of the prediction objective function (blue line). It is obtained by calculating the objective function for all values of the area of the house. The chart also contains price-area pairs used for training.
Right now the prediction chart looks pretty good. The graph's coordinates (position and slope) are determined by the vector
θ { 1.004579, 5.286822 }
. But how can you determine which
θ
-vector is best suited for your application? Will the function fit better if you change the first or maybe the second parameter? To determine the best-fit theta vector, you need a utility function that evaluates how well the objective function does the job.
TO BE CONTINUED Translation from English. Author: Gregor Roth, Software Architect, JavaWorld.
GO TO FULL VERSION