The Wasserstein distance is a powerful tool in modern machine learning to metrize the space of probability distributions in a way that takes into account the geometry of the domain. Therefore, a lot of attention has been devoted in the literature to understanding rates of convergence for Wasserstein distances based on i.i.d data. However, often in machine learning applications, especially in reinforcement learning, object tracking, performative prediction, and other online learning problems, observations are received sequentially, rendering some inherent temporal dependence. Motivated by this observation, we attempt to understand the problem of estimating Wasserstein distances using the natural plug-in estimator based on stationary beta-mixing sequences, a widely used assumption in the study of dependent processes. Our rates of convergence results are applicable under both short and long-range dependence. As expected, under short-range dependence, the rates match those observed in the i.i.d. case. Interestingly, however, even under long-range dependence, we can show that the rates can match those in the i.i.d. case provided the (intrinsic) dimension is large enough. Our analysis establishes a non-trivial trade-off between the degree of dependence and the complexity of certain function classes on the domain. The key technique in our proofs is a blend of the big-block-small-block method coupled with Berbee’s lemma and chaining arguments for suprema of empirical processes.