Alignment Methods for Large Language Models Based on Human Feedback

Nianlin Li

doi:10.54097/dy7eda57

Authors

Nianlin Li

DOI:

https://doi.org/10.54097/dy7eda57

Keywords:

Human Feedback, Large Language Models, Alignment, Reinforcement Learning.

Abstract

In recent years, artificial intelligence has developed rapidly, and large language models have been widely used. However, as the capabilities of the model continue to improve, there is a risk that its outputs may contain inaccurate information, be misleading, or deviate from human values. To ensure that the model is safe, reliable, and adheres to ethical standards, it is particularly crucial to guide the model's behavior using human feedback alignment techniques. This article systematically organizes relevant alignment methods, clarifies the core connotation of human feedback, further analyzes the key links in the alignment process, categorizes and discusses based on the principles and implementation technologies of the methods, organizes commonly used datasets and evaluation systems, and finally provides a summary. The significance of this research lies in constructing a complete knowledge system, clarifying the technical implementation path, and providing theoretical support and guidance for large language models to better align with human intentions and be safely applied in critical areas.

Downloads

Download data is not yet available.

References

[1] Bubeck S, Chandrasekaran V, Eldan R, et al. Sparks of artificial general intelligence: early experiments with GPT-4. arXiv preprint arXiv: 2303.12712, 2023.

[2] Zhang Yuying, Yun Jing, Liu Xueying, et al. A review of feedback-based methods for aligning content and behavior of large language models. Computer Engineering and Applications, 2025, 1 - 37. https: //link-cnki-net-s.v.bsu.edu.cn/urlid/11.2127.tp.20250522.1435.011.

[3] Liu Kunlin, Qu Xinji, Tan Fang, et al. A survey of alignment research in large language models. Telecommunications Science, 2024, 40 (6): 173 - 194.

[4] Askell A, Bai YT, Chen AN, et al. A general language assistant as a laboratory for alignment. arXiv preprint arXiv: 2112.00861, 2021.

[5] Ouyang L, Wu J, Jiang X, et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 2022 (35): 27730 - 27744.

[6] Muldrew W, Hayes P, Zhang M, et al. Active preference learning for large language models. arXiv preprint arXiv: 2402.08114, 2024.

[7] Xiong Deyi. Research on information retrieval models based on ranking learning algorithms. Chongqing University of Posts and Telecommunications, 2018. DOI: 10.27675/d.cnki.gcydx.2018.000554.

[8] Yang C, Xie L. Data augmentation and contrastive learning based on large language models. Advances in Engineering Innovation, 2025, 16 (7): 109 - 118.

[9] Chen Xiliang, Cao Lei, He Ming, et al. A review of research on deep inverse reinforcement learning. Computer Engineering and Applications, 2018, 54 (5): 24 - 35. DOI: 10.3778/j.issn.1002 - 8331.1711 - 0289.

[10] Zhu Jinghua, Guo Xu, et al. Neural network recommendation model based on user vector representation and attention mechanism. In: 2018 National High-Performance Computing Academic Annual Conference; China Computer Society, 2018.

[11] Minto L, Haller M, Haddadi H, et al. Stronger privacy for federated collaborative filtering with implicit feedback. 2021. DOI: 10.48550/arXiv.2105.03941.

Alignment Methods for Large Language Models Based on Human Feedback

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Indexing

Latest publications

Information