Stochastic Adaptive Dynamics of a Simple Market as a Non-Stationary Multi-Armed Bandit Problem
We develop a dynamic monopoly pricing model as a non-stationary multi-armed bandit problem. At each time, the monopolist chooses a price in a finite set and each customer decides stochastically but independently to visit or not his store. Each customer is characterized by two parameters, an ability-to-pay and a probability to visit. Our problem is non-stationary for the monopolist because each customer modifies his probability with experience. We define an ex-ante optimal price for our problem and then look at two different ways of learning this optimal price. In the first part, assuming the monopolist knows everything but the ability-topay, we suggest a simple counting rule based on purchase behavior which allows him to obtain enough information to compute the optimal price. In the second part, assuming no particular knowledge, we consider the case in which the monopolist uses an adaptive stochastic algorithm. When learning is easy (difficult), our simulations suggest that the monopolist (does not) choose the optimal price on each sample path.
Multi-armed Bandit Problem, Adaptive Learning, Stochastic Market Dynamics, Exploration-exploitation Trade-off, Non-stationarity