Stochastic Adaptive Dynamics of a Simple Market as a NonStationary MultiArmed Bandit Problem
We develop a dynamic monopoly pricing model as a nonstationary multiarmed bandit problem. At each time, the monopolist chooses a price in a finite set and each customer decides stochastically but independently to visit or not his store. Each customer is characterized by two parameters, an abilitytopay and a probability to visit. Our problem is nonstationary for the monopolist because each customer modifies his probability with experience. We define an exante optimal price for our problem and then look at two different ways of learning this optimal price. In the first part, assuming the monopolist knows everything but the abilitytopay, we suggest a simple counting rule based on purchase behavior which allows him to obtain enough information to compute the optimal price. In the second part, assuming no particular knowledge, we consider the case in which the monopolist uses an adaptive stochastic algorithm. When learning is easy (difficult), our simulations suggest that the monopolist (does not) choose the optimal price on each sample path.
Yann BRAOUEZEC
Multiarmed Bandit Problem, Adaptive Learning, Stochastic Market Dynamics, Explorationexploitation Tradeoff, Nonstationarity
Anglais
