On the downside, it means embedding-based quality ranking may rank a highly emotional and flowery summary above a more straightforward clinical account. This has both positive and negative implications. Given that embeddings are not as responsive to salience as they are to topical affinity, it appears that the embeddings in this case fixate on the factual contents of each summary, rather than their surface expression. Upon closer inspection, it appears a big driving force of these results is that ChatGPT's "creative" and "inspired fiction" summaries are not actually as creative or as fictional as expected and instead largely merely rephrase a relatively accurate factual summary into flowery prose. The end result is that, unlike our earlier examples, here embeddings do yield as strong of a stratification. Let's explore a more advanced example that mixes both real and fictionally expanded accounts of the White House cocaine story, along with different language from journalistic to creative. Earlier this month we demonstrated the use of embeddings to combat hallucination in LLM summarization and as a form of quality ranking of machine-generated summaries.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |