-
361
-
355
-
339
-
322
-
264
Semantic Variational Bayes Based on Semantic Information G Theory for Solving Latent Variables
DOI:
https://doi.org/10.30564/jeis.v8i1.12828Abstract
The minimum variational free energy criterion comprises two criteria: the maximum semantic information criterion and the maximum information efficiency criterion, but it does not provide a method for balancing them. The Semantic Information G Theory, the author proposed in his early years, extends the rate-distortion function R(D) to the rate-fidelity function R(G), where R is the minimum mutual information for given semantic mutual information G. Semantic Variational Bayes (SVB) is based on the parameter solution of R(G), where the variational and iterative methods originated from Shannon et al.'s research on the rate-distortion function. SVB not only uses likelihood functions but also truth, membership, similarity, distortion, and copula density functions as constraint functions. It explicitly uses the maximum information efficiency (G/R) criterion and facilitates the trade-off between maximum semantic information and maximum information efficiency. The computational experiments include 1) using some mixture models as an examples to show that mixture models converges as G/R increases; 2) demonstrating the application of SVB in data compression with a group of error ranges as the constraint; 3) illustrating how the semantic information measure and SVB can be used for maximum entropy control and reinforcement learning in control tasks with given range constraints, providing numerical evidence for balancing control's purposiveness and efficiency. The limitation of SVB is that it does not account for parameter probability distributions. Further research is needed to apply SVB to deep learning.
Keywords:
Variational Bayes; Semantic Information Theory; Rate-Distortion Function; Rate-Fidelity Function; Latent Variable; Expectation–Maximization (EM) Algorithm; Variational Free Energy; Maximum Entropy ControlReferences
[1] Beal, M.J., 2003. Variational Algorithms for Approximate Bayesian Inference [PhD Thesis]. University College London: London, UK.
[2] Attias, H., 2013. Inferring parameters and structure of latent variable models by variational Bayes. ArXiv preprint. arXiv:1301.6676. DOI: https://doi.org/10.48550/arXiv.1301.6676
[3] Wikipedia, n.d. Variational Bayesian methods. Available from: https://en.wikipedia.org/wiki/Variational_Bayesian_methods (cited 26 November 2025).
[4] Neal, R., Hinton, G., 1999. A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan, M.I. (Ed.). Learning in Graphical Models. MIT Press: Cambridge, MA, USA. pp. 355–368.
[5] Kingma, D.P., Welling, M., 2022. Auto-Encoding Variational Bayes. ArXiv preprint. arXiv:1312.6114. DOI: https://doi.org/10.48550/arXiv.1312.6114
[6] Friston, K., 2010. The free-energy principle: A unified brain theory? Nature Reviews Neuroscience. 11(2), 127–138. DOI: https://doi.org/10.1038/NRN2787
[7] Yellapragada, M.S., Konkimalla, C.P., 2019. Variational Bayes: A report on approaches and applications. ArXiv preprint. arXiv:1312.6114. DOI: https://doi.org/10.48550/arXiv.1312.6114
[8] Akuzawa, K., Iwasawa, Y., Matsuo, Y., 2021. Information-theoretic regularization for learning global features by sequential VAE. Machine Learning. 110(8), 2239–2266. DOI: https://doi.org/10.1007/s10994-021-06032-4
[9] Ding, S., Du, W., Ding, L., et al., 2023. Robust Multi-agent Communication with Graph Information Bottleneck Optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence. 46(6), 3096–3107. DOI: https://doi.org/10.1109/TPAMI.2023.3337534
[10] Higgins, I., Matthey, L., Pal, A., et al., 2017. β-VAE: Learning basic visual concepts with a constrained variational framework. In Proceedings of the International Conference on Learning Representations. Toulon, France, 24–26 April 2017.
[11] Shannon, C.E., 1948. A mathematical theory of communication. Bell System Technical Journal. 27(3), 379–423.
[12] Shannon, C.E., 1959. Coding theorems for a discrete source with a fidelity criterion. In Institute of Radio Engineers, International Convention Record, Vol 7. Institute of Radio Engineers (U.S.): New York, NY, USA. pp. 325–350.
[13] Berger, T., 1971. Rate Distortion Theory: A Mathematical Basis for Data Compression. Prentice-Hall: Englewood Cliffs, NJ, USA.
[14] Berger, T., Gibson, J.D., 1998. Lossy source coding. IEEE Transactions on Information Theory. 44(6), 2693–2723.
[15] Fasoulakis, M., Varsos, K., Traganitis, A., 2024. Revisit the Arimoto-Blahut algorithm: New Analysis with Approximation. ArXiv preprint. arXiv:2407.06013. DOI: https://doi.org/10.48550/arXiv.2407.06013
[16] Lu, C., 1999. A generalization of Shannon's information theory. International Journal of General Systems. 28(6), 453–490.
[17] Lu, C., 2019. Semantic information G theory and logical Bayesian inference for machine learning. Information. 10(8), 261. DOI: https://doi.org/10.3390/INFO10080261
[18] Davidson, D., 1967. Truth and meaning. Synthese. 17(3), 304–323.
[19] Lu, C., 1993. A Generalized Information Theory. China Science and Technology University Press: Hefei, China. (in Chinese)
[20] Lu, C., 2025. A Semantic Generalization of Shannon's Information Theory and Applications. Entropy. 27(5), 461. DOI: https://doi.org/10.3390/e27050461
[21] Lu, C., 2025. Improving the Minimum Free Energy Principle to the Maximum Information Efficiency Principle. Entropy. 27(7), 684. DOI: https://doi.org/10.3390/e27070684
[22] Parr, T., Pezzulo, G., Friston, K.J., 2022. Active Inference: The Free Energy Principle in Mind, Brain, and Behavior. MIT Press: Cambridge, MA, USA.
[23] Kolmogorov, A.N., 1933. Basic Concepts of Probability Theory. Springer: Berlin/Heidelberg, Germany. (in German)
[24] Mises, R., 1957. Probability, Statistics and Truth. 2nd ed. George Allen and Unwin Ltd.: London, UK.
[25] Zadeh, L.A., 1965. Fuzzy sets. Information and Control. 8(3), 338–353.
[26] Zadeh, L.A., 1968. Probability measures of fuzzy events. Journal of Mathematical Analysis and Applications. 23(2), 421–427.
[27] Kullback, S., Leibler, R., 1951. On information and sufficiency. Annals of Mathematical Statistics. 22(1), 79–86.
[28] Popper, K., 2002. Conjectures and Refutations. 1st ed. Routledge: London, UK.
[29] Oord, A.V.D., Li, L., Vinyals, O., 2018. Representation learning with contrastive predictive coding. ArXiv preprint. arXiv:1807.03748. DOI: https://doi.org/10.48550/arXiv.1807.03748
[30] Belghazi, M.I., Baratin, A., Rajeswar, S., et al., 2018. MINE: Mutual information neural estimation. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018. pp. 1–44.
[31] Fienberg, S.E., 2006. When did Bayesian inference become "Bayesian"? Bayesian Analysis. 1(1), 1–40.
[32] Wikipedia, n.d. Copula (statistics). Available from: https://en.wikipedia.org/wiki/Copula_(probability_theory) (cited 26 November 2025).
[33] Ma, J., Sun, Z., 2011. Mutual information is copula entropy. Tsinghua Science and Technology. 16(1), 51–54.
[34] Krupskii, P., Joe, H., 2022. Approximate likelihood with proxy variables for parameter estimation in high-dimensional factor copula models. Statistical Papers. 63(2), 543–569.
[35] Sengupta, B., Stemmler, M.B., Friston, K.J., 2013. Information and efficiency in the nervous system—A synthesis. PLoS Computational Biology. 9(7), e1003157.
[36] Dempster, A.P., Laird, N.M., Rubin, D.B., 1997. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B. 39(1), 1–38.
[37] Ueda, N., Nakano, R., 1998. Deterministic annealing EM algorithm. Neural Networks. 11(2), 271–282.
[38] Zhang, Y., Yang, Y., 2024. Bayesian model selection via mean-field variational approximation. Journal of the Royal Statistical Society Series B: Statistical Methodology. 86(3), 742–770.
Downloads
How to Cite
Issue
Article Type
License
Copyright © 2026 Chenguang Lu

This is an open access article under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.




Chenguang Lu