Social Choice for AI Alignment: Dealing with Diverse Human Feedback
Briefly

Foundation models such as GPT-4 are fine-tuned to avoid unsafe or otherwise problematic behavior, so that, for example, they refuse to comply with requests for help with committing crimes or with producing racist text.
One approach to fine-tuning, called reinforcement learning from human feedback, learns from humans' expressed preferences over multiple outputs.
Read at arXiv.org
[
add
]
[
|
|
]