JavaRush /Java Blog /Random EN /Chain of words task

Chain of words task

Published in the Random EN group
While taking the Java.Multithreading course, I was very interested in the task of composing a chain of words: https://javarush.com/tasks/com.javarush.task.task22.task2209 . The task itself: given a chain of words, it is necessary to order them in such a way that for each pair of words the last letter of the first word coincides with the first letter of the next one. Naturally, the final chain must include all the words of the original chain once. For example, "Kiev New York Amsterdam Vienna Melbourne" -> "Amsterdam Melbourne New York Kiev Vienna" or "ab ac bc" -> "ab bc ca". The problem interested me from the point of view of combinatorics and finding a solution. Word chain task - 1<h2>1. Algorithm</h2><h3>1.1. Brute-force</h3>The simplest algorithm that comes to mind is searching through all possible combinations until the desired one is found. The main disadvantage is a sharp and accelerating increase in the number of combinations - the number of combinations for n words will be equal to the factorial and the complexity of the algorithm is O(n!). For 3 words, the following set is obtained - 6 combinations: Word chain task - 1For 4 words: Chain of words task - 2Wikipedia suggests that for 10 words there will be 3.2 million combinations, and for 100 words - ~10^158 combinations. Going through such a number of combinations looks somewhat... difficult... <h3>1.2. Increment</h3>So, we need some other algorithm, and not just enumeration, for example, sequential joining: (1) Take the first word, (2) Take the next one and try to join it (on the left or right) to the first. If it works, repeat for all remaining words. (3) If the initial list is empty, we have found the sequence; if it is not empty, failure, go to (4) (4) We take as the initial word not the first, but the second, etc. Chain of words task - 3If I'm not mistaken, the complexity of this algorithm is O(n^2), which is much less than the first one. The problem is that there can be sequences (quite long) for which starting with any character does not lead to a solution - the line becomes looped and the remaining characters cannot be appended. In principle, further options are possible - if it doesn’t work, you can go in the reverse order (reverse the line and start over), or randomly mix up the words, etc. This is similar to playing thimbles - if it doesn’t work out, we try to mix the cubes in the box until we get the desired sequence. How long it will take and whether it will work is unclear. <h3>1.3. Cyclic sequences</h3>This algorithm is based on a simple idea: if we have 2 sequences that satisfy the conditions of the problem, they can be combined into one if they contain intersecting characters. Chain of words task - 4And the algorithm will be like this: (1) Split the original sequence into x minimal cyclic sequences that satisfy the conditions of the problem (2) Combine the cyclic sequences into one final sequence that satisfies the conditions of the problem. It is worth noting that words containing the same first and last letters in this case can be considered as minimal cyclic sequences. And they can either be attached to other words at stage (1) or left for stage (2). <h2>2. Implementation</h2>The code is posted on github , further, if necessary, I will mention [functions] that implement the described task. <h3>2.1. Elementary brick</h3>During implementation, you will have to very often refer to the first and last letters of a word, so a class containing, in addition to the word itself, also its first and last letters was used as an elementary brick:
class Word {
    private final String string;
    private final char firstLetter;
    private final char lastLetter;

    Word(String string) {
        this.string = string;
        firstLetter = Character.toLowerCase(string.charAt(0));
        lastLetter = Character.toLowerCase(string.charAt(string.length() - 1));
    }
}
<h3>2.2. Checking the original sequence</h3>Before starting the main search algorithm, you should make sure that the problem is, in principle, solvable. I implemented this using the following checks (hereinafter in this paragraph, S is the source string, W is the Word array created based on it):
  1. S is not null, length S > 0 (well, that's obvious)
  2. In W, the number and types of first and last characters are the same [checkLetters()] .
    For example, the string "ab ba" is solvable and contains firstLetters = {a=1, b=1}, lastLetters = {a=1, b=1}, And the string "ab ba bc" is undecidable and contains firstLetters = {a=
    1 , b=2 }, lastLetters = {a=1, b=1, c=1 }.
  3. All words in W are connected to each other [checkConnectivity()] . For example, the word "ab" provides the connectedness [a,b], in the sequence "ab bc cd da" the connected symbols [a,b,c,d], both of these sequences are decidable. But the sequence "ab bc ca fg gf" is unsolvable and contains 2 disconnected blocks: {[a,b,c] and [f,g]}. I checked the connectivity using List<set<character>> as follows:
    • 3.1. We take any word, check whether its first/last characters are contained in any Set<character>
    • 3.2. If none of the Set<character> contains its first or last letter - create a new Set<character> with them
    • 3.3. If only 1 Set<character> contains its first/last letter - add another letter to this Set
    • 3.4. If 1 set contains the first letter, and the second the last, we combine these sets
    • 3.5. Repeat for all words
    • 3.6. If in the end List<set<character>> contains only 1 set, the sequence is connected , if more than 1, then it is not connected and cannot be resolved.
<h3>2.3. Search for the final sequence [generateResultLists()]</h3>To search, we had to create an additional class CycleList, which contains a List<word> that satisfies the conditions of the task, as well as a Set<character>, which contains many characters from the List<words>. The main "feature" of this class is its ability to rearrange so that the List begins (and ends) with any necessary letter included in the Set<character>. For example, (regroup(b)) "ab bc ca" -> "bc ca ab". This allows you to easily merge 2 such sheets that have an intersection of symbols - it is enough to regroup both of them by the intersecting symbol and you can add one sheet to the other (the algorithm described above is implemented quite easily).
private static class CycleList {
        List<word> list;
        Set<character> characterSet = new HashSet<>();

        public CycleList(List<word> list) {
            this.list = new ArrayList<>(list);
            list.forEach(w -> {
                characterSet.add(w.getFirstLetter());
                characterSet.add(w.getLastLetter());
            });
        }

        void regroup(char ch) {
            int first = 0;
            while (first++ < list.size())
                if (list.get(first).getFirstLetter() == ch) break;
            List<word> tempList = new ArrayList<>(list.size());
            list.stream().skip(first).forEachOrdered(tempList::add);
            list.stream().limit(first).forEachOrdered(tempList::add);
            list = tempList;
        }
    }
<h2>3. In conclusion</h2>I really liked the problem, I had to rack my brains from the point of view of the algorithm, but I don’t regret it at all. While combing the resulting code, I began to better understand working with lambda expressions and collections in Java. The described code works quite quickly; on my no longer young PC, an array of 30,000 random words is sorted in about 1 second. Chain of words task - 6In addition to the described solution, the code on Github also contains:
  1. Random Sequence Generator
  2. The SequenceAlgorithm class, which implements the algorithm from section (1.2)
  3. Several sequences to test, for example my implementation of SequenceAlgorithm does not find a solution for this sequence (neither forward nor backward): LK Ud aa LL WY OK lQ cK MK MM UU ll kk ss LJ HH dU dI aQ HJ ky ik mL MW jT KO JL lz ki Us WW QQ zl jj KK Id yk sU YW WB Ql KL JJ LL KK Tj JH OO ll Kc WM KM kk aC Lm Qa dd BW Ca WW
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION