Semantic Variational Bayes Based on Semantic Information G Theory for Solving Latent Variables

Chenguang lu

doi:10.30564/jeis.v8i1.12828

Authors

Chenguang Lu
Intelligence Engineering and Mathematics Institute, Liaoning Technical University, Fuxin 123000, China

School of Computer Engineering and Applied Mathematics, Changsha University, Changsha 410022, China

DOI:

https://doi.org/10.30564/jeis.v8i1.12828

Received: 1 December 2025 | Revised: 10 March 2026 | Accepted: 17 March 2026 | Published Online: 17 April 2026

Abstract

The minimum variational free energy criterion comprises two criteria: the maximum semantic information criterion and the maximum information efficiency criterion, but it does not provide a method for balancing them. The Semantic Information G Theory, the author proposed in his early years, extends the rate-distortion function R(D) to the rate-fidelity function R(G), where R is the minimum mutual information for given semantic mutual information G. Semantic Variational Bayes (SVB) is based on the parameter solution of R(G), where the variational and iterative methods originated from Shannon et al.'s research on the rate-distortion function. SVB not only uses likelihood functions but also truth, membership, similarity, distortion, and copula density functions as constraint functions. It explicitly uses the maximum information efficiency (G/R) criterion and facilitates the trade-off between maximum semantic information and maximum information efficiency. The computational experiments include 1) using some mixture models as an examples to show that mixture models converges as G/R increases; 2) demonstrating the application of SVB in data compression with a group of error ranges as the constraint; 3) illustrating how the semantic information measure and SVB can be used for maximum entropy control and reinforcement learning in control tasks with given range constraints, providing numerical evidence for balancing control's purposiveness and efficiency. The limitation of SVB is that it does not account for parameter probability distributions. Further research is needed to apply SVB to deep learning.

Keywords:

Variational Bayes; Semantic Information Theory; Rate-Distortion Function; Rate-Fidelity Function; Latent Variable; Expectation–Maximization (EM) Algorithm; Variational Free Energy; Maximum Entropy Control

References

[1] Beal, M.J., 2003. Variational Algorithms for Approximate Bayesian Inference [PhD Thesis]. University College London: London, UK.

[2] Attias, H., 2013. Inferring parameters and structure of latent variable models by variational Bayes. ArXiv preprint. arXiv:1301.6676. DOI: https://doi.org/10.48550/arXiv.1301.6676

[3] Wikipedia, n.d. Variational Bayesian methods. Available from: https://en.wikipedia.org/wiki/Variational_Bayesian_methods (cited 26 November 2025).

[4] Neal, R., Hinton, G., 1999. A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan, M.I. (Ed.). Learning in Graphical Models. MIT Press: Cambridge, MA, USA. pp. 355–368.

[5] Kingma, D.P., Welling, M., 2022. Auto-Encoding Variational Bayes. ArXiv preprint. arXiv:1312.6114. DOI: https://doi.org/10.48550/arXiv.1312.6114

[6] Friston, K., 2010. The free-energy principle: A unified brain theory? Nature Reviews Neuroscience. 11(2), 127–138. DOI: https://doi.org/10.1038/NRN2787

[7] Yellapragada, M.S., Konkimalla, C.P., 2019. Variational Bayes: A report on approaches and applications. ArXiv preprint. arXiv:1312.6114. DOI: https://doi.org/10.48550/arXiv.1312.6114

[8] Akuzawa, K., Iwasawa, Y., Matsuo, Y., 2021. Information-theoretic regularization for learning global features by sequential VAE. Machine Learning. 110(8), 2239–2266. DOI: https://doi.org/10.1007/s10994-021-06032-4

[9] Ding, S., Du, W., Ding, L., et al., 2023. Robust Multi-agent Communication with Graph Information Bottleneck Optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence. 46(6), 3096–3107. DOI: https://doi.org/10.1109/TPAMI.2023.3337534

[10] Higgins, I., Matthey, L., Pal, A., et al., 2017. β-VAE: Learning basic visual concepts with a constrained variational framework. In Proceedings of the International Conference on Learning Representations. Toulon, France, 24–26 April 2017.

[11] Shannon, C.E., 1948. A mathematical theory of communication. Bell System Technical Journal. 27(3), 379–423.

[12] Shannon, C.E., 1959. Coding theorems for a discrete source with a fidelity criterion. In Institute of Radio Engineers, International Convention Record, Vol 7. Institute of Radio Engineers (U.S.): New York, NY, USA. pp. 325–350.

[13] Berger, T., 1971. Rate Distortion Theory: A Mathematical Basis for Data Compression. Prentice-Hall: Englewood Cliffs, NJ, USA.

[14] Berger, T., Gibson, J.D., 1998. Lossy source coding. IEEE Transactions on Information Theory. 44(6), 2693–2723.

[15] Fasoulakis, M., Varsos, K., Traganitis, A., 2024. Revisit the Arimoto-Blahut algorithm: New Analysis with Approximation. ArXiv preprint. arXiv:2407.06013. DOI: https://doi.org/10.48550/arXiv.2407.06013

[16] Lu, C., 1999. A generalization of Shannon's information theory. International Journal of General Systems. 28(6), 453–490.

[17] Lu, C., 2019. Semantic information G theory and logical Bayesian inference for machine learning. Information. 10(8), 261. DOI: https://doi.org/10.3390/INFO10080261

[18] Davidson, D., 1967. Truth and meaning. Synthese. 17(3), 304–323.

[19] Lu, C., 1993. A Generalized Information Theory. China Science and Technology University Press: Hefei, China. (in Chinese)

[20] Lu, C., 2025. A Semantic Generalization of Shannon's Information Theory and Applications. Entropy. 27(5), 461. DOI: https://doi.org/10.3390/e27050461

[21] Lu, C., 2025. Improving the Minimum Free Energy Principle to the Maximum Information Efficiency Principle. Entropy. 27(7), 684. DOI: https://doi.org/10.3390/e27070684

[22] Parr, T., Pezzulo, G., Friston, K.J., 2022. Active Inference: The Free Energy Principle in Mind, Brain, and Behavior. MIT Press: Cambridge, MA, USA.

[23] Kolmogorov, A.N., 1933. Basic Concepts of Probability Theory. Springer: Berlin/Heidelberg, Germany. (in German)

[24] Mises, R., 1957. Probability, Statistics and Truth. 2nd ed. George Allen and Unwin Ltd.: London, UK.

[25] Zadeh, L.A., 1965. Fuzzy sets. Information and Control. 8(3), 338–353.

[26] Zadeh, L.A., 1968. Probability measures of fuzzy events. Journal of Mathematical Analysis and Applications. 23(2), 421–427.

[27] Kullback, S., Leibler, R., 1951. On information and sufficiency. Annals of Mathematical Statistics. 22(1), 79–86.

[28] Popper, K., 2002. Conjectures and Refutations. 1st ed. Routledge: London, UK.

[29] Oord, A.V.D., Li, L., Vinyals, O., 2018. Representation learning with contrastive predictive coding. ArXiv preprint. arXiv:1807.03748. DOI: https://doi.org/10.48550/arXiv.1807.03748

[30] Belghazi, M.I., Baratin, A., Rajeswar, S., et al., 2018. MINE: Mutual information neural estimation. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018. pp. 1–44.

[31] Fienberg, S.E., 2006. When did Bayesian inference become "Bayesian"? Bayesian Analysis. 1(1), 1–40.

[32] Wikipedia, n.d. Copula (statistics). Available from: https://en.wikipedia.org/wiki/Copula_(probability_theory) (cited 26 November 2025).

[33] Ma, J., Sun, Z., 2011. Mutual information is copula entropy. Tsinghua Science and Technology. 16(1), 51–54.

[34] Krupskii, P., Joe, H., 2022. Approximate likelihood with proxy variables for parameter estimation in high-dimensional factor copula models. Statistical Papers. 63(2), 543–569.

[35] Sengupta, B., Stemmler, M.B., Friston, K.J., 2013. Information and efficiency in the nervous system—A synthesis. PLoS Computational Biology. 9(7), e1003157.

[36] Dempster, A.P., Laird, N.M., Rubin, D.B., 1997. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B. 39(1), 1–38.

[37] Ueda, N., Nakano, R., 1998. Deterministic annealing EM algorithm. Neural Networks. 11(2), 271–282.

[38] Zhang, Y., Yang, Y., 2024. Bayesian model selection via mean-field variational approximation. Journal of the Royal Statistical Society Series B: Statistical Methodology. 86(3), 742–770.

Volume 8 | 2026

Vol.8 Iss.1

Volume 7 | 2025

Vol.7 Iss.2

Vol.7 Iss.1

Volume 6 | 2024

Vol.6 Iss.2

Vol.6 Iss.1

Volume 5 | 2023

Vol.5 Iss.2

Vol.5 Iss.1

Volume 4 | 2022

Vol.4 Iss.2

Vol.4 Iss.1

Volume 3 | 2021

Vol.3 Iss.2

Vol.3 Iss.1

Volume 2 | 2020

Vol.2 Iss.2

Vol.2 Iss.1

Volume 1 | 2019

Vol.1 Iss.2

Vol.1 Iss.1

Announcements

Acknowledgement to Reviewers of Journal of Electronic & Information Systems in 2025

Semantic Variational Bayes Based on Semantic Information G Theory for Solving Latent Variables

Authors

DOI:

Abstract

Keywords:

References

Downloads

How to Cite

Issue

Article Type

License