JavaRush /Java Blog /Random EN /Machine Learning for Java Developers Part 2

Machine Learning for Java Developers Part 2

Published in the Random EN group
Machine Learning for Java Developers, Part 1
Machine Learning for Java Developers, Part 2 - 1

Objective Function Estimation

Let us recall that the target function , also known as the prediction function, is the result of the preparation or training process. Mathematically, the challenge is to find a function that takes a variable as input хand returns the predicted value у.
Machine Learning for Java Developers, Part 2 - 2
In machine learning, a cost function (J(θ))is used to calculate the error value or "cost" of a given objective function.
Machine Learning for Java Developers, Part 2 - 3
The cost function shows how well the model fits the training data. To determine the cost of the objective function shown above, it is necessary to calculate the squared error of each example house (i). Error is the distance between the calculated value уand the real value yof the house from the example i.
Machine Learning for Java Developers, Part 2 - 4
For example, the real price of a house with an area of ​​1330 = 6,500,000 € . And the difference between the predicted house price by the trained objective function is €7,032,478 : the difference (or error) is €532,478 . You can also see this difference in the graph above. The difference (or error) is shown as vertical dashed red lines for each price-area training pair. Having calculated the cost of the trained objective function, you need to sum the squared error for each house in the example and calculate the main value. The smaller the price value (J(θ)), the more accurate the predictions of our objective function will be. Listing 3 shows a simple Java implementation of a cost function that takes as input an objective function, a list of training data, and labels associated with them. The prediction values ​​will be calculated in a loop and the error will be calculated by subtracting the real price value (taken from the label). Later, the square of the errors will be summed and the error value will be calculated. The cost will be returned as a value of type double:

Listing-3

public static double cost(Function<ltDouble[], Double> targetFunction,
 List<ltDouble[]> dataset,
 List<ltDouble> labels) {
 int m = dataset.size();
 double sumSquaredErrors = 0;

 // рассчет квадрата ошибки («разницы») для каждого тренировочного примера и //добавление его к сумме
 for (int i = 0; i < m; i++) {
 // получаем вектор признаков из текущего примера
 Double[] featureVector = dataset.get(i);
 // предсказываем meaning и вычисляем ошибку базируясь на реальном
 //значении (метка)
 double predicted = targetFunction.apply(featureVector);
 double label = labels.get(i);
 double gap = predicted - label;
 sumSquaredErrors += Math.pow(gap, 2);
 }

 // Вычисляем и возращаем meaning ошибки (чем меньше тем лучше)
 return (1.0 / (2 * m)) * sumSquaredErrors;
}
Interested in reading about Java? Join the Java Developer group !

Learning the target function

Although the cost function helps evaluate the quality of the objective function and theta parameters, you still need to find the most suitable theta parameters. You can use the gradient descent algorithm for this.

Gradient Descent

Gradient descent minimizes the cost function. This means that it is used to find the theta parameters that have the minimum cost (J(θ))based on the training data. Here's a simplified algorithm for calculating new, more appropriate theta values:
Machine Learning for Java Developers, Part 2 - 5
So, the parameters of the theta vector will improve with each iteration of the algorithm. The learning coefficient α specifies the number of calculations at each iteration. These calculations can be carried out until "good" theta values ​​are found. For example, the linear regression function below has three theta parameters:
Machine Learning for Java Developers, Part 2 - 6
At each iteration, a new value will be calculated for each of the theta parameters: , , and . After each iteration, a new, more appropriate implementation can be created using the new theta vector 0 , θ 1 , θ 2 } . Listing -4 shows the Java code for the gradient decay algorithm. Theta for the regression function will be trained using training data, marker data, learning rate . The result will be an improved objective function using theta parameters. The method will be called again and again, passing the new objective function and the new theta parameters from previous calculations. And these calls will be repeated until the configured objective function reaches a minimum plateau: θ0θ1θ2LinearRegressionFunction(α)train()

Listing-4

public static LinearRegressionFunction train(LinearRegressionFunction targetFunction,
 List<ltDouble[]> dataset,
 List<ltDouble> labels,
 double alpha) {
 int m = dataset.size();
 double[] thetaVector = targetFunction.getThetas();
 double[] newThetaVector = new double[thetaVector.length];

 // вычисление нового значения тета для каждого element тета массива
 for (int j = 0; j < thetaVector.length; j++) {
 // сумируем разницу ошибки * признак
 double sumErrors = 0;
 for (int i = 0; i < m; i++) {
 Double[] featureVector = dataset.get(i);
 double error = targetFunction.apply(featureVector) - labels.get(i);
 sumErrors += error * featureVector[j];
 }

 //вычисляем новые значения тета
 double gradient = (1.0 / m) * sumErrors;
 newThetaVector[j] = thetaVector[j] - alpha * gradient;
 }

 return new LinearRegressionFunction(newThetaVector);
}
To ensure that the cost continually decreases, you can run the cost function J(θ)after each training step. After each iteration, the cost should decrease. If this does not happen, it means that the value of the learning coefficient is too large and the algorithm has simply missed the minimum value. In such a case, the gradient decay algorithm fails. The plots below show the objective function using the new, calculated theta parameters, starting with the starting theta vector {1.0, 1.0}. The left column shows the plot of the prediction function after 50 iterations; middle column after 200 repetitions; and the right column after 1000 repetitions. From these we can see that the price decreases after each iteration, and the new objective function fits better and better. After 500-600 repetitions, the theta parameters no longer change significantly, and the price reaches a stable plateau. After this, the accuracy of the target function cannot be improved in this way.
Machine Learning for Java Developers, Part 2 - 7
In this case, even though the cost no longer decreases significantly after 500-600 iterations, the objective function is still not optimal. This indicates a discrepancy . In machine learning, the term "inconsistency" is used to mean that the learning algorithm does not find underlying trends in the data. Based on real-life experience, it is likely to expect a reduction in the price per square meter for larger properties. From this we can conclude that the model used for the target function learning process does not fit the data well enough. The discrepancy is often due to oversimplification of the model. This happened in our case, the objective function is too simple, and for analysis it uses a single parameter - the area of ​​the house. But this information is not enough to accurately predict the price of a house.

Adding features and scaling them

If you find that your objective function does not correspond to the problem you are trying to solve, it needs to be adjusted. A common way to correct for inconsistency is to add additional features to the feature vector. In the example of the price of a house, you can add characteristics such as the number of rooms or the age of the house. That is, instead of using a vector with one feature value {size}to describe a house, you can use a vector with several values, for example, {size, number-of-rooms, age}. In some cases, the number of features in the available training data is not enough. Then it’s worth trying to use polynomial features that are calculated using existing ones. For example, you have the opportunity to extend the objective function for determining the price of a house so that it includes a calculated feature of square meters (x2):
Machine Learning for Java Developers, Part 2 - 8
Using multiple features requires feature scaling , which is used to standardize the range across different features. Thus, the range of values ​​for the size 2 attribute is significantly larger than the range of values ​​for the size attribute. Without feature scaling, size 2 will unduly influence the cost function. The error introduced by the size 2 attribute will be significantly larger than the error introduced by the size attribute. A simple feature scaling algorithm is given below:
Machine Learning for Java Developers, Part 2 - 9
This algorithm is implemented in the class FeaturesScalingin the example code below. The class FeaturesScalingpresents a commercial method for creating a scaling function that is tuned to training data. Internally, the training data instances are used to calculate the average, minimum and maximum values. The resulting function takes the feature vector and produces a new one with the scaled features. Feature scaling is necessary for both the learning process and the prediction process, as shown below:
// создание массива данных
List<ltDouble[]> dataset = new ArrayList<>();
dataset.add(new Double[] { 1.0, 90.0, 8100.0 }); // feature vector of house#1
dataset.add(new Double[] { 1.0, 101.0, 10201.0 }); // feature vector of house#2
dataset.add(new Double[] { 1.0, 103.0, 10609.0 }); // ...
//...

// создание меток
List<ltDouble> labels = new ArrayList<>();
labels.add(249.0); // price label of house#1
labels.add(338.0); // price label of house#2
labels.add(304.0); // ...
//...

// создание расширенного списка признаков
Function<ltDouble[], Double[]> scalingFunc = FeaturesScaling.createFunction(dataset);
List<ltDouble[]> scaledDataset = dataset.stream().map(scalingFunc).collect(Collectors.toList());

// создаем функцию которая инициализирует теты и осуществляет обучение //используя коэффициент обучения 0.1

LinearRegressionFunction targetFunction = new LinearRegressionFunction(new double[] { 1.0, 1.0, 1.0 });
for (int i = 0; i < 10000; i++) {
 targetFunction = Learner.train(targetFunction, scaledDataset, labels, 0.1);
}

// делаем предсказание стоимости дома с площадью 600 m2
Double[] scaledFeatureVector = scalingFunc.apply(new Double[] { 1.0, 600.0, 360000.0 });
double predictedPrice = targetFunction.apply(scaledFeatureVector);
As more and more features are added, the fit to the objective function increases, but be careful. If you go too far and add too many features, you may end up learning an objective function that is overfit.

Over-matching and cross-validation

Overfitting occurs when the objective function or model fits the training data too well, so much so that it captures noise or random variations in the training data. An example of overfitting is shown in the rightmost graph below:
Machine Learning for Java Developers, Part 2 - 10
However, an overfitting model performs very well on training data, but will perform poorly on real unknown data. There are several ways to avoid overfitting.
  • Use a larger data set for training.
  • Use fewer features as shown in the graphs above.
  • Use an improved machine learning algorithm that takes regularization into account.
If a prediction algorithm overfits the training data, it is necessary to eliminate features that do not benefit its accuracy. The difficulty is to find features that have a more significant effect on the accuracy of prediction than others. As shown in the graphs, overfit can be determined visually using graphs. This works well for graphs with 2 or 3 coordinates, it becomes difficult to plot and evaluate the graph if you use more than 2 features. In cross-validation, you retest models after training using data unknown to the algorithm after the training process is complete. Available labeled data should be divided into 3 sets:
  • training data;
  • verification data;
  • test data.
In this case, 60 percent of the labeled records characterizing the houses should be used in the process of training variants of the target algorithm. After the training process, half of the remaining data (not previously used) should be used to verify that the trained target algorithm performs well on the unknown data. Typically, the algorithm that performs better than others is selected for use. The remaining data is used to calculate the error value for the final selected model. There are other cross-validation techniques, such as k-fold . However, I will not describe them in this article.

Machine learning tools and Weka framework

Most frameworks and libraries provide an extensive collection of machine learning algorithms. In addition, they provide a convenient high-level interface to training, testing and processing data models. Weka is one of the most popular frameworks for the JVM. Weka is a practical Java library that contains graphical tests for validating models. The example below uses the Weka library to create a training dataset that contains features and labels. Method setClassIndex()- for marking. In Weka, a label is defined as a class:
// определяем атрибуты для признаков и меток
ArrayList<ltAttribute> attributes = new ArrayList<>();
Attribute sizeAttribute = new Attribute("sizeFeature");
attributes.add(sizeAttribute);
Attribute squaredSizeAttribute = new Attribute("squaredSizeFeature");
attributes.add(squaredSizeAttribute);
Attribute priceAttribute = new Attribute("priceLabel");
attributes.add(priceAttribute);


// создаем и заполняем список признаков 5000 примеров
Instances trainingDataset = new Instances("trainData", attributes, 5000);
trainingDataset.setClassIndex(trainingSet.numAttributes() - 1);
Instance instance = new DenseInstance(3);

instance.setValue(sizeAttribute, 90.0);
instance.setValue(squaredSizeAttribute, Math.pow(90.0, 2));
instance.setValue(priceAttribute, 249.0);
trainingDataset.add(instance);
Instance instance = new DenseInstance(3);
instance.setValue(sizeAttribute, 101.0);
...
The Data Set and Sample Object can be saved and loaded from a file. Weka uses ARFF (Attribute Relation File Format) which is supported by Weka's graphics benchmarks. This dataset is used to train an objective function known as a classifier in Weka. First of all, you must define the objective function. The code below LinearRegressionwill create an instance of the classifier. This classifier will be trained using the buildClassifier(). The method buildClassifier()selects theta parameters based on training data in search of the best target model. With Weka, you don't have to worry about setting the learning rate or number of iterations. Weka also performs feature scaling independently.
Classifier targetFunction = new LinearRegression();
targetFunction.buildClassifier(trainingDataset);
Once these settings are made, the objective function can be used to predict the price of the house, as shown below:
Instances unlabeledInstances = new Instances("predictionset", attributes, 1);
unlabeledInstances.setClassIndex(trainingSet.numAttributes() - 1);
Instance unlabeled = new DenseInstance(3);
unlabeled.setValue(sizeAttribute, 1330.0);
unlabeled.setValue(squaredSizeAttribute, Math.pow(1330.0, 2));
unlabeledInstances.add(unlabeled);

double prediction = targetFunction.classifyInstance(unlabeledInstances.get(0));
Weka provides a class Evaluationto test a trained classifier or model. In the code below, a selected array of validation data is used to avoid false results. The measurement results (cost of error) will be displayed on the console. Typically, evaluation results are used to compare models that were trained using different machine learning algorithms, or variations of these:
Evaluation evaluation = new Evaluation(trainingDataset);
evaluation.evaluateModel(targetFunction, validationDataset);
System.out.println(evaluation.toSummaryString("Results", false));
The example above uses linear regression, which predicts numerical values, such as the price of a house, based on input values. Linear regression supports the prediction of continuous numerical values. To predict binary values ​​(“Yes” and “No”), you need to use other machine learning algorithms. For example, decision tree, neural networks or logistic regression.
// использование логистической регрессии
Classifier targetFunction = new Logistic();
targetFunction.buildClassifier(trainingSet);
You can use one of these algorithms, for example, to predict whether an email message is spam, or predict the weather, or predict whether a house will sell well. If you want to teach your algorithm to predict the weather or how quickly a house will sell, you need a different data set, e.g.topseller:
// использование атрибута маркера topseller instead of атрибута маркера цена
ArrayList<string> classVal = new ArrayList<>();
classVal.add("true");
classVal.add("false");

Attribute topsellerAttribute = new Attribute("topsellerLabel", classVal);
attributes.add(topsellerAttribute);
This dataset will be used to train a new classifier topseller. Once it has been trained, the prediction call should return a token class index that can be used to obtain the predicted value.
int idx = (int) targetFunction.classifyInstance(unlabeledInstances.get(0));
String prediction = classVal.get(idx);

Conclusion

Although machine learning is closely related to statistics and uses many mathematical concepts, the machine learning toolkit allows you to start integrating machine learning into your programs without deep knowledge of mathematics. However, the better you understand the underlying machine learning algorithms, such as the linear regression algorithm we explored in this article, the more you will be able to choose the right algorithm and tune it for optimal performance. Translation from English. Author: Gregor Roth, Software Architect, JavaWorld.
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION