How to make machines understand what they do

In my last post Stupid way of learning I pointed out that learning bridge bidding just by feeding algorithm with learning data seems to be extremely ineffective. But, in fact, machine learning relies on this idea. You provide hundreds milions of training examples, hoping that it is enough to generate reasonably good predicting system just because of the probability. Do people learn in the same way? Unbelievable! We have at least some general optimizing algorithms.

Probably, one can learn playing bridge quite fast. But usually earlier he played a lot of other card games, so he knows what are spades, hearts, aces and others.
Also, usually even earlier, she played other games, so she learned what does it mean to win or loose, what are points and their collecting. And many, many other ideas. All those earlier experiences facilitate the learning process, or, maybe, make it possible at all.

So how to build an algorythm which can use some prelearned ideas to understand all underlying concepts and, at the same time, to avoid human partiality? For example, it is a common idea to start bidding with 1 spade, when you have some strength (12-21 honour points) and at least five spades. The same stands for hearts. But, why does not start with 1 spade if you have five hearts and start with 1 heart when you have spades? The second system looks less reasonable and even if used, would be harader to remember and understand. It is wort to notice, that more proffesional systems have solutions of that type. But all of them bases on a common knowledge developed for a long time by many wise people. Hoever, even such a knowledge was not enoguh to beat AlphaGo Zero, which plays go sometimes completely different from human way of playing.

Probably a good point of start is to use embeddings. This powerfull idea was successfully adopted for word processing algorithms.
Surely, we can embed: colors and levels of bids, colors and the strengths of a card. But what about the final game result? Is it enough just to provide a minmax result in points without learning the system how the result is calculated, i. e. without real play to the end?

Leave a Reply Cancel reply