Alexa and GPT-3: First Exploration

Square

After waiting a year or so for access to the API for OpenAI’s GPT-3 (an autoregressive language model that uses deep learning to produce human-like text), I threw together a first Alexa skill using GPT-3 in an afternoon.

Get the code on GitHub

Of the categories supported by GPT-3, I based my code on simple completion, as it requires no complex examples that would be required train GPT-3 on more powerful completions. You could adapt this code with more complex prompt examples to explore other use cases of GPT-3. In my case, a simple “story-starting” prompt is all it takes to generate the next lines of the story. See: https://beta.openai.com/docs/introduction/completion

There are two basic intents in my skill: UtteranceIntent and ContinueIntent (in addition to standard help and stop intents). Both UtteranceIntent and ContinueIntent are handled by the same handler. UtteranceIntent captures the free-form utterance and sends it to GPT-3, and responds with the utterance and the GPT-3 completion. The completion is stored in a session attribute. ContinueIntent is invoked with the user requests that the story continue, and uses the previous completion session attribute to feed GPT-3. The response is formed by the last sentence of the previous completion followed by the new completion. (It seemed a little less jarring to repeat the last bit of the previous completion, rather than starting, often in mid-sentence, with the continuation).

Notes

You will need to provide your own GPT-3 key and store it in an evironment variable (GPT3KEY) accessible from your Lambda function: https://share.hsforms.com/1Lfc7WtPLRk2ppXhPjcYY-A4sk30 Note that access to a key for experimentation gives you a fixed amount of GPT-3 credits ($18 US) for a limited time period (three months). Additional use will require a subscription starting at $100 US per month. Credits are applied based on the tokenization of the input and output, and which GPT-3 model you are using. The default GPT-3 model for this skill is called “ada”, which is the fastest and (by far) the cheapest model of GPT-3. You can set the model in a constant (GPT3MODEL), but be aware that the more expensive models (e.g. “davinci”) can be up to 75x more expensive. GPT-3 pricing by model, here: https://beta.openai.com/pricing

The skill uses three of the available parameters for GPT-3 completion. Other parameters are documented at: https://beta.openai.com/docs/api-reference/completions. One key parameter is the number of “tokens” (based on how GPT-3 tokenizes text) in the response. I set it to 75, which offers a reasonably interesting length for an Alexa response, but as you are charged by the token, I don’t want to set it too high. Another parameter is “temperature”, and as soon as I actually read the documentation, I’ll let you know what that is!

User utterances are modeled as free-form input without any specific slots. The utterance intent is based on the method described here: https://stackoverflow.com/a/53334157

The method for calling GPT-3 from node.js is derived from: https://www.twilio.com/blog/getting-started-with-openai-s-gpt-3-in-node-js. For the https POST call to GPT-3, I use the “got” library: https://www.npmjs.com/package/got

I added a failsafe timeout mechanism so the skill doesn’t fail if the API call to GPT-3 API times out. This is based on the code here: https://levelup.gitconnected.com/promise-with-timeout-in-javascript-e42911ba23e1

GPT-3 recommends (and I believe Alexa certification would require) filtering the output to remove offensive language. My skill code uses the bad-words filter: https://www.npmjs.com/package/bad-words

Thoughts

Though this was fairly easy to cobble together, I don’t think I’ll be jumping into publishing a skill with it any time soon. I’ll continue to experiment, but the cost of maintaining a skill at $100/month is prohibitive, unless I can monetize a killer skill somehow. Maybe, but for now I’ll continue fiddling about.

I was able to jump in quickly without reading too much of the GPT-3 documentation. Unfortunately, based on the sample code I was using, I ran with the GPT-3 DaVinci model, which while touted as the most powerful, is also the most expensive. I ran through 83¢ of my $18 allotment in the first afternoon, before discovering that the Ada model was 75x less expensive and faster.

Is it any good? YMMV, but there are more interesting story-like completions than fails, though there are a few, when it looks like it is generating a product review, or something that really isn’t a story but a list of some sort. There are so many things GPT-3 can do, half the fun of getting access is imagining how they could be used in the context of an Alexa skill. Though I’m not likely to publish a skill in the near future, my next two explorations are likely to be:

  • Extending the skill into more of a voice-driven GPT-3 playground (or something like this, if you don’t have an account) where you can request model or parameter settings to be set or changed with voice commands, e.g. “Alexa, set temperature to zero point seven” (whatever that means).
  • As I’ve been playing with emotion/sentiment analysis APIs in skills, I’d like to go beyond crafting SSML styles to provide appropriate responses based on detected emotions. I can imagine feeding GPT-3 with model prompts that could do that.

How would you integrate GPT-3 into an Alexa skill?

Get the code on GitHub

Leave a Reply

Your email address will not be published. Required fields are marked *