NLG (natural language generation) may be too powerful for its own good. This technology can generate huge varieties of natural-language textual content in vast quantities at top speed.

Functioning like a superpowered “autocomplete” program, NLG continues to improve in speed and sophistication. It enables people to author complex documents without having to manually specify every word that appears in the final draft. Current NLG approaches include everything from template-based mail-merge programs that generate form letters to sophisticated AI systems that incorporate computational linguistics algorithms and can generate a dizzying array of content types.

The promise and pitfalls of GPT-3

Today’s most sophisticated NLG algorithms learn the intricacies of human speech by training complex statistical models on huge corpora of human-written texts.

Introduced in May 2020, OpenAI’s Generative Pretrained Transformer 3 (GPT-3) can generate many types of natural-language text based on a mere handful of training examples. The algorithm can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. It can also generate a complete essay purely on the basis of a single starting sentence, a few words, or even a prompt. Impressively, it can even compose a song given only a musical intro or lay out a webpage based solely on a few lines of HTML code. 

With AI as its rocket fuel, NLG is becoming more and more powerful. At GPT-3’s launch, OpenAI reported that the algorithm could process NLG models that include up to 175 billion parameters. Showing that GPT-3 is not the only NLG game in town, several months later, Microsoft announced a new version of its open source DeepSpeed that can efficiently train models that incorporate up to 1 trillion parameters. And in January 2021, Google released a trillion-parameter NLG model of its own, dubbed Switch Transformer.

Preventing toxic content is easier said than done

Impressive as these NLG industry milestones might be, the technology’s immense power may also be its chief weakness. Even when NLG tools are used with the best intentions, their relentless productivity can overwhelm a human author’s ability to thoroughly review every last detail that gets published under their name. Consequently, the author of record on an NLG-generated text may not realize if they are publishing distorted, false, offensive, or defamatory material.

Copyright © 2021 IDG Communications, Inc.