Reinforcement Learning from Human Feedback

reinforcement learning from human feedback

Until a few years ago, the most advanced language models we had were GPT-2 and BERT. GPT-2 was the most advanced auto-regressive decoder based model that was suitable for Text Generation. The model T5 was state of the art for other tasks like Translation, Summarization. These models have been a great starting point but were … Read more

Insert math as
Additional settings
Formula color
Text color
Type math using LaTeX
Nothing to preview