Hello All!
I wanted to start a thread in the community to discuss implementations and solutions to the Kaggle X-Mas challenge. For those of you who don’t know, Kaggle it is a site intended for promoting data science through courses and challenges. Every X-Mas they have a special optimization challenge. You can find more information below on this years challenge in the link below:
https://www.kaggle.com/c/santa-2021/overview
A quick summary is: “Santa is creating a new streaming service and will play a 24/7 loop of 7 films. He would like the showing sequence to be optimal. Hence 3 groups of elves are watching all possible sequences (7! = 5040) of 7 movies to evaluate what is the best sequence.
So, it has to do with permutations and TSP’s. Permutations, because we need to identify all possible combinations of sequences. And TSP’s because the problem can be transformed to a multi Traval Salesman Problem. Every salesman in this case is a group of Elves, and every city is a sequence of (7) films that must seen by at least 1 group (5040 cities in total). In order to optimize the sequence, we can evaluate the distance between permutations as below:
For example, the two sequences “1234567” and “4567123” could be connected as 123{4567]123}. The films between h ]’s are the first sequence, and the films between { }’s are the second sequence. Hence if they watch the films in this order (1234567123) they are able to watch 2 (7 long) sequences in one go, viewing a total of 10 movies as opposed to 14 (7 x 2), improving the total necessary time to watch all sequences.
This is a difficult problem. TSP is already a challenging problem. Having multiple salesmen, a high quantity of cities and asymmetrical distances only makes it harder.
There are two additional business rules that need to be taken into account:
- All elf groups must watch a subset of the permutations (that start with 2 specific movies). So there is overlap in visiting these specific cities, but not all of them (120 of the total 5040).
- There are 2 wildcards that can be placed amid a sequence. That wildcard serves as any of the movies; hence you can cover a lot by using this.
I will share a project and implementation I have done for this below and would love to discuss on how to solve this and improve!
Thanks