How to make machines understand what they do

In my last post Stupid way of learning I pointed out that learning bridge bidding just by feeding algorithm with learning data seems to be extremely ineffective. But, in fact, machine learning relies on this idea. You provide hundreds milions of training examples, hoping that it is enough to generate reasonably good predicting system just because of the probability. Do people learn in the same way? Unbelievable! We have at least some general optimizing algorithms.

Probably, one can learn playing bridge quite fast. But usually earlier he played a lot of other card games, so he knows what are spades, hearts, aces and others.
Also, usually even earlier, she played other games, so she learned what does it mean to win or loose, what are points and their collecting. And many, many other ideas. All those earlier experiences facilitate the learning process, or, maybe, make it possible at all.

So how to build an algorythm which can use some prelearned ideas to understand all underlying concepts and, at the same time, to avoid human partiality? For example, it is a common idea to start bidding with 1 spade, when you have some strength (12-21 honour points) and at least five spades. The same stands for hearts. But, why does not start with 1 spade if you have five hearts and start with 1 heart when you have spades? The second system looks less reasonable and even if used, would be harader to remember and understand. It is wort to notice, that more proffesional systems have solutions of that type. But all of them bases on a common knowledge developed for a long time by many wise people. Hoever, even such a knowledge was not enoguh to beat AlphaGo Zero, which plays go sometimes completely different from human way of playing.

Probably a good point of start is to use embeddings. This powerfull idea was successfully adopted for word processing algorithms.
Surely, we can embed: colors and levels of bids, colors and the strengths of a card. But what about the final game result? Is it enough just to provide a minmax result in points without learning the system how the result is calculated, i. e. without real play to the end?

Stupid way of learning

Assume, that you play the following game:
You use cards like in Dixit (TM?) – or, better, cards generated by random paiting.

1. You are given 2 bunches each of 10 cards.
2. At the beginning you have to choose one card from first bunch (arbiter says, which are forbidden). With no information, you choose a random one and put it on the table.
3. Then some other cards are put on the table.
4. Again there is your move and you choose one card from your first bunch (arbiter again says, which cards are forbidden).
5. Game is continued until arbiter says: “end”.

As a result you get some point (negative or positive).

Next game, you get different bunches of cards (however some cards appear again).

Conclusion:

The pupil has no information about meaning of cards and bids as well as no relation understanding. So this method of bridge learning seems to be extremely ineffective. On the other hand, the underlying idea is to avoid using human bidding systems.

So, how to improve learning but avoid contamination?

Contract probabilities of random biddings

In my last post (7NTxx gains over 74% popularity) I wondered why learning results are so bad. But, I understood that the underlying idea was not correct.

My RNN started from random bidding, then is was learned to improve its performance. So, instead of comparison to proper bidding in human sense, we should compare resulting biddings statistics to random bidding statistics.

Looking at the results beloew, I see (no surely, since there is no statistical error analysis), that RNN does better than the random player. Apart from increasing 7NTxx probability, in case of RNN usually the probabilities of high contracts are lower than the probabilities of “normal” ones.

I run 100.000 of random bidding trials and got the following results:

Contract Probability Count
7NTxx 52.72% 52724
7NT 23.38% 23375
7NTx 17.65% 17653
7S 2.68% 2681
7Sx 1.06% 1058
7Sxx 0.94% 941
7H 0.69% 691
7D 0.26% 256
7Hx 0.19% 190
7C 0.09% 94
7Hxx 0.09% 88
6NT 0.06% 60
7Dx 0.05% 49
6S 0.04% 35
7Cx 0.02% 22
6H 0.02% 21
7Dxx 0.02% 17
6C 0.01% 8
6D 0.01% 5
6NTx 0.01% 5
6Sx 0.00% 3
6Cx 0.00% 3
5H 0.00% 2
5NT 0.00% 2
4S 0.00% 2
5S 0.00% 2
7Cxx 0.00% 2
4H 0.00% 2
5C 0.00% 2
6NTxx 0.00% 1
4C 0.00% 1
4NT 0.00% 1
6Dx 0.00% 1
1C 0.00% 1
6Hx 0.00% 1
4Sx 0.00% 1

We can calculate the probability of final contracts in some simple cases, directly.  E. g.

  • the probability of PASS (4 passes) is: (1/36)^4 (the same for each bidding like:  some_bid, pass, pass, pass)
  • the probability of 7NT contract after 7NT bid has happened is 0.25
  • the probability of 7NTx (doubled) contract after 7NT bid is 0.1875
  • and so the probability of 7NTxx (redoubled) contract after 7NT bid is 1-0.25-0.1875 = 0.5625

So the probability of 7NTxx contract (without assumption that 7NT was bid) is not greater than 0.5625.

After nightly run, on 1.320.000 trails the results are:

Contract Probability Count
7NTxx 52.67% 695222
7NT 23.44% 309436
7NTx 17.60% 232270
7S 2.72% 35908
7Sx 1.05% 13823
7Sxx 0.92% 12209
7H 0.69% 9127
7D 0.25% 3254
7Hx 0.19% 2443
7C 0.11% 1492
7Hxx 0.09% 1216
6NT 0.06% 750
7Dx 0.05% 673
6S 0.03% 455
7Cx 0.02% 280
6H 0.02% 250
7Dxx 0.02% 222
6D 0.01% 185
6NTx 0.01% 97
6C 0.01% 96
5NT 0.01% 88
7Cxx 0.01% 73
6Sx 0.00% 61
5S 0.00% 58
5H 0.00% 42
5D 0.00% 38
6Hx 0.00% 34
5C 0.00% 22
6NTxx 0.00% 21
6Dx 0.00% 16
4NT 0.00% 15
4S 0.00% 13
6Sxx 0.00% 12
4D 0.00% 11
5NTx 0.00% 9
6Cx 0.00% 9
5Sx 0.00% 8
3NT 0.00% 6
4H 0.00% 5
5Hx 0.00% 5
3D 0.00% 5
4C 0.00% 4
5Cx 0.00% 4
6Hxx 0.00% 4
2H 0.00% 4
3C 0.00% 4
4NTx 0.00% 3
3H 0.00% 2
2NT 0.00% 2
1C 0.00% 2
6Dxx 0.00% 2
2S 0.00% 2
2C 0.00% 2
1D 0.00% 1
5Dx 0.00% 1
4Sx 0.00% 1
1NT 0.00% 1
3S 0.00% 1
1H 0.00% 1
4Dx 0.00% 1

7NTxx gains over 74% popularity ;)

After 2-day model training it generates over random cards the following contracts (this is example on 1000 random deals):

 

Contract Probability Count
7NTxx 74.80% 748
7Sx 7.40% 74
7Sxx 6.50% 65
2C 3.30% 33
PASS 1.10% 11
6NT 1.10% 11
1Hx 0.90% 9
1H 0.80% 8
7D 0.70% 7
1C 0.60% 6
6Sx 0.40% 4
1Cx 0.30% 3
6C 0.30% 3
6Cx 0.30% 3
2Hx 0.20% 2
7Hx 0.20% 2
6NTx 0.20% 2
5S 0.10% 1
5D 0.10% 1
6H 0.10% 1
3NT 0.10% 1
2Cx 0.10% 1
1Dx 0.10% 1
6D 0.10% 1
7NTx 0.10% 1
1S 0.10% 1

Surely, it is too optimistic.

Learning idea is as follows:

I initiate RNN with random values, then:

loop:

1. generate N random deals and biddings (player plays with itself, of course it does not know other hands)

2. I calculate result of contract (real points) using DDS (double dummy solver)

3. I compare real result with…. should be minimax, but, to make it simple, I compare it with Milton points expected result. So, I get score for the bidding result contract.

4. Then, for each subbidding, I assign the value for each bid (for declarer pair the result, for the second pair minus result)

5. train with obtained training examples

 

So, why the software generates 7NTxx?

Surely, double is correct while 7NT and redouble are not correct.

Hello world!

Welcome to Bridge Zero Project.

I am bridge player and neural network developer (Python, Keras). Here I’m trying to build model that learns bridge bidding by itself i. e. without use of human knowledge.  The purpose is to “generate” optimal bidding solution.

What does “optimal” mean? Optimal bid depends on system used by pair and surely optimal system depends on system used by opponents. So, by optimal system, I understand system that bids optimally against all possible systems (including itself).

It is possible that there is no minimax and there can exist systems that can beat optimal universal system.