12. Entropy and arrow of time#
Microscopic view vs. macroscopic view. Usually we don’t know it accurately/we don’t care to know it, and summarize it with a macroscopic view: in this description some information is lost
Usually:
Microscopic symmetry, with uniform transition matrix
Many microstates correspond to the same macrostate
As the system evolves, its macrostate may evolve towards a stationary state, corrsponding with the macrostate with the most microstates. This evolution has a preferred direction - defining an arrow of time -, the one with non-decreasing expected entropy. This evolution can also be described in terms of increasing ignorance
A macroscopic equilibrium corresponds to a stationary distribution of the macrostates with quite sharp maximum. While the macroscopic view looks stationary, the system is still evolving at the microscopic level. The amplitude of oscillations around an equilibrium decreases as \(\sim \frac{1}{\sqrt{N}}\)
12.1. Box occupation example#
Remark. In this example, the microscopic dynamics of the system is not deterministic.
A box with two rooms contains \(N\) chambers. This balls are numbered so that they are distingushable if you can see their labels.
Macrostate definition. A macrostate of the system is defined here by the number of balls in each room, \((N_L, N_R)\). As the total number of balls is given and constant, the number of balls in one chamber, e.g. \(N_L\) in the left chamber, is enough to determine the macrostate being \(N_R = N - N_L\).
Microstate counting. The number of microstates producing the macrostate \(N_L\) is the number of combinations, \(C_{N, N_L}\)
Entropy of a macrostate. …
Probability transition between macrostates. Being the probability uniform for any transition of microstates, the probability of transition between macrostates reads
Expected change of entropy.
todo keep going with the algebra without the limit of large number of particles…
In the limit \(N \gg 1\), \(n \gg 1\), \(N-n \gg 1\),
Both factors change sign when \(\frac{n}{N} = \frac{1}{2}\). For \(\frac{n}{N} < \frac{1}{2}\) both factors are positive, for \(\frac{n}{N} > \frac{1}{2}\) both factors are negative. Thus, the expected variation of entropy of this process is non-negative for every value of \(n \in [0,N]\) and thus for every transition,
Markov process of macrostates. The macrostates and the transition between them can be model as a stationary Markov process. The probability of the system of being in (macro)state \(n\) at time \(t\) is \(p_n(t)\), and using the transition probability \(P(n|m)\) from state \(m\) at \(t-1\) to state \(n\) at \(t\), the following holds
or using matrix formalism, introducing transition probability matrix \(\left\{\mathbf{P}\right\}_{mn} := P(n|m)\),
The transition probabilities (12.1) define the elements of the transition probability matrix \(\mathbf{P}\),
Stationary distribution. The stationary distribution \(\mathbf{p}^*\) is invariant under a transition, i.e. it satisfies \(\mathbf{p}^* = \mathbf{p}^* \mathbf{P}\). The stationary distribution of the process is the Binomial distribution
Proof
This can be proved by direct computation. The first element reads
or
The \(n^{th}\) element (\(n \in (1,N-1)\)) reads
or
Statistics of the stochastic process. …
Statistics of the stationary distribution.
Average \(\mu = \frac{N}{2}\)
Variance \(\sigma^2 = \frac{N}{4}\)
Standard deviation \(\sigma = \frac{\sqrt{N}}{2}\)
Thus the relative fluctuation around the average can be estimated with the ratio of the standard deviation and the average value,
so, the relative fluctuation decreases as \(N\) increases.
\(N\) |
Relative fluctuation |
|---|---|
\(10^2\) |
\(10^{-1}\) |
\(10^4\) |
\(10^{-2}\) |
\(10^6\) |
\(10^{-3}\) |
\(10^8\) |
\(10^{-4}\) |
Average of a binomial distribution
Variance of a binomial distribution
Then,
where the first term follows from a similar procedure as the one used for the average, using the identity \(n(n-1)\left(\begin{matrix} N \\n \end{matrix}\right) = N(N-1) \left( \begin{matrix} N-2 \\ n-2 \end{matrix} \right)\),
Thus the variance reads
Markov process of microstates. A microstate is defined by the id of the box of each of the \(N\) balls. One-step transition switch the index of one ball. If the transition probability between these microstates is uniform it has probability \(\frac{1}{N}\). Not, let’s consider two states \(i\), \(j\). If these states are adjacent, the probabilities \(P^{micro}(i \rightarrow j) = P^{micro}(j \rightarrow i) = \frac{1}{N}\), and zero otherwise. Thus, the transition probability matrix \(\mathbf{P}^{micro}\) between microstates is symmetric. Since the transition matrix is symmetric, the stationary distribution is uniform. This is immediately proved using the normalization condition \(\mathbf{P} \mathbf{1} = \mathbf{1}\), and the definition of a stationary distribution, \(\mathbf{p}^{* T} = \mathbf{p}^{* T} \mathbf{P}\), and the definition of symmetric transition matrix \(\mathbf{P} = \mathbf{P}^T\). The transpose of the definition of the stationary distribution, \(\mathbf{p}^* = \mathbf{P} \mathbf{p}^*\), compared with the normalization condition, provides the relation \(\mathbf{p}^* = \alpha \mathbf{1}\), and normalization condition \(1 = |\mathbf{p}^*|_1 = \alpha N\), and \(\alpha = \frac{1}{N}\). The stationary distribution for the symmetric Markov process is \(\mathbf{p}^* = \frac{1}{N} \mathbf{1}\).
Combinatorics naturally links the distribution of microstates and the distribution of the macrostates.
H-Theorem for symmetric Markov chains. A discrete version of the Boltzmann’s H-Theorem exists for symmetric Markov process, assessing that the entropy \(H(\mathbf{p}(t)) := - \sum_k p_k(t) \ln p_k(t)\) is non-decreasing, i.e.
Proof
Remark. A generic transition probability matrix \(\mathbf{P}\) satisfies the normalization condition \(\sum_j P_{ij} = 1\), for every \(i\) (row sum). For a symmetric probability matrix \(\mathbf{P} = \mathbf{P}^T\), also the column sum holds \(\sum_i P_{ij} = 1\).
Remark. The function \(f(x) = x \ln x\) is convex, i.e. \(f''(x) > 0\), for \(x > 0\).
Jensen’s Inequality. As \(f\) is convex, and \(\sum_{i} P_{ik} = 1\), with \(P_{ik} \in (0,1)\),
Using Jensen’s inequality in the expression of \(H(t+1)\),
i.e. \(H(t+1) \ge H(t)\).