According to OnCrawl’s Vincent Terrasi – 2nd place winner in TechSEO Boost’s Research Competition – “2019 is the best year for using AI for text automation.” We’ll explore the technologies behind his statement during his 2019 TechSEO Boost presentation and the results of his research below.
Text Generation Techniques 2016 vs. 2019
Vincent led his presentation by comparing the text generation landscape in 2016 vs. 2019. He pointed out that text generation has been employed by SEOs in the past. Practices such as text spinning, which switches out a few words in an existing piece of text to create a pseudo-unique piece of content, have been widely used in the past. Let’s get real though, Google is no longer fooled by such cheap tactics anymore.
BERT and Other Natural Language Processing Transformers
In October 2019, Google launched BERT for US English Queries and has since been rolled out for 70 languages worldwide. BERT stands for Bidirectional Encoder Representations from Transformers and uses a neural network approach to natural language processing (NLP). Google designed BERT to detect poor, bad, and spammy content in a sophisticated manner. However, BERT isn’t the only model out there on the market. Vincent goes into detail describing alternative models like GPT-2 from Open AI, ELMo from AllenNLP and ULMFiT from J Howard.
He then tells us why he settled on the GPT-2 model for his own research. While Google’s BERT is a bidirectional transformer and needs context, GPT-2 is a powerful omni-directional transformer, perfect for analyzing and ultimately generating text.
So alright, we have our preferred transformer, now we’re ready to generate our own text, right? Not so fast. In order for these transformers to create text, they need specific output from an “attention model” created from a collection of selected texts. Essentially, an attention model is how words or even characters relate to one another in context. Using attention models, we can understand the relationship between words in a sentence and throughout the text.
Steps of Text Generation
According to Vincent, there are four primary steps for using AI to generate text:
- Training the Model: As it turns out, training the model takes a lot of text – more than 100,000 pieces of content with a minimum of 500 words, all in the same language. Project Gutenberg is a massive collection of written works that have been digitalized. This massive set of text can then be paired with deep learning framework PyTorch to disseminate the text in a way that will later be used by GPT-2 to generate new text. It also takes a considerable amount of processing power to train your model – and if you don’t own a super computer, likely at least a month using a typical GPU!
- Generating the Compressed Training Dataset: This essentially encodes the massive amount of insight afforded by PyTorch into a format easily readable by GPT-2.
- Fine Tune the Model: Vincent tells us that we can further fine tune our model by setting vocabulary size, embed size, size of the attention model and the number of neural networks.
- Generating Article Text: Ok, now that we have a finely tuned model, we’re able to employ GPT-2 to finally create some text. All we need now is to define the beginning sentence, how much we want our text to deviate from our model and how much text to generate.
While Vincent’s work was done in French and he didn’t have an English example, he swears by the results of his model to create high quality content.
Now It’s Your Turn
Towards the end of his presentation, Vincent ushers a challenge to the audience: “Now it’s your turn.” He provided workable NLP models, a deep learning framework and steps to train your model and generate texts. Everything needed to generate our own text using AI.
Sounds like a lot of time, effort and additional research needed if you never even heard of NLP before now.
However, consider this: using AI to generate text will allow SEO’s to automate some of our most tedious tasks such as anchor text generation, title tags, and meta descriptions.
While this method does advance the ability to automate article creation, Vincent tells us it is not yet perfect enough to forego a full human review altogether. Instead, these automated articles can be used as an in-depth guide to create a unique piece of content with most of the work already completed.
I’ll let you decide if 2019 is the best year for “…using AI for text generation.” However, Vincent is probably right, at least until further breakthroughs are made in 2020. So now it is your turn. Your turn to advance these techniques and perfect models so that every year throughout the 2020s is the best year for using AI for text generation.
Watch Vincent’s Presentation & Other 2019 TechSEO Boost Sessions
Want more of Vincent’s TechSEO Boost 2019 presentation? You can watch all of the 2019 TechSEO Boost presentations, including Vincent’s, here on the Catalyst site. You can also get the speakers’ slides from the Catalyst SlideShare.