NLP with MindsDB and OpenAI
MindsDB NLP Supported Tasks
MindsDB lets you create models that utilize features provided by OpenAI GPT-3. Currently, there are three operation modes:
- Answering Questions without Context
- Answering Questions with Context
- Prompt Completion
Currently, MindsDB’s NLP engine is powered by Hugging Face and OpenAI. But we plan to expand to other NLP options in the future, so stay tuned!
Fine-tune an OpenAI model with MindsDB
All OpenAI models belong to the group of Large Language Models (LLMs). By definition, these are pre-trained on large amounts of data. However, it is possible to fine-tune these models with a task-specific dataset for a defined use case.
OpenAI supports fine-tuning of some of its models listed here. And with MindsDB, you can easily fine-tune an OpenAI model making it more applicable to your specific use case.
How to Bring the OpenAI Model to MindsDB
We use the CREATE ML_ENGINE
and CREATE MODEL
statement to bring the OpenAI models to MindsDB.
We first create the openai engine by providing the openai_api_key:
Next, use this engine to create the model as:
Follow this instruction to set up the OpenAI integration in MindsDB.
Example
For more examples and explanations, visit our doc page on OpenAI.
Example using SQL
Let’s go through a sentiment classification example to understand better how to bring OpenAI models to MindsDB as AI tables.
On execution, we get:
Where:
Expressions | Values |
---|---|
project_name | mindsdb |
predictor_name | sentiment_classifier |
target_column | sentiment |
engine | openai |
prompt_template | predict the sentiment of the text:{{review}} exactly as either positive or negative or neutral |
In the prompt_template
parameter, we use a placeholder for a text value that
comes from the review
column, that is, text:{{ review }}
.
Before querying for predictions, we should verify the status of the sentiment_classifier
model.
On execution, we get:
Once the status is complete
, we can query for predictions.
Don’t forget to create the example_db
database before using one of its tables, like in the query above.
On execution, we get:
For the full library of supported examples please go here.
Example using MQL
Now let’s go through a sentiment classification using Mongo database syntax.
We have a sample Mongo database that you can connect to your MindsDB Cloud account by running this command in Mongo Shell:
Followed by:
We use this sample database throughout the example.
The next step is to create a connection between Mongo and MindsDB. Follow the instructions to connect MindsDB via Mongo Compass or Mongo Shell.
Now, we are ready to create an OpenAI model.
On execution, we get:
Before querying for predictions, we should verify the status of the sentiment_classifier
model.
On execution, we get:
Once the status is complete
, we can query for a single prediction.
On execution, we get:
You can also query for batch predictions. Here we use the mongo_test_db
database connected earlier in this example.
On execution, we get:
For the full library of supported examples please go here.
Parameter descriptions
MindsDB lets you customize models using parameters provided by OpenAI. Currently, there are eleven parameters to optionally modify:
- model_name: An optional string that identifies the model to use, it defaults to text-davinci-002 model, for a list of models available and their description visit: Model overview
- max_tokens: The maximum number of tokens to generate in the completion. The token count of your prompt plus max_tokens cannot exceed the model’s context length.
- temperature: What sampling temperature to use. Higher values means the model will take more risks.
- top_p: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.
- n: How many completions to generate for each prompt.
- stop: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
- presence_penalty: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.
- frequency_penalty: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.
- best_of: Generates best_of completions server-side and returns the “best” (the one with the highest log probability per token). Results cannot be streamed. When used with n, best_of controls the number of candidate completions and n specifies how many to return – best_of must be greater than n.
- logit_bias: Modify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the GPT tokenizer) to an associated bias value from -100 to 100. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.
- user: A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse.
What’s Next?
Have fun while trying it out yourself!
- Bookmark MindsDB repository on GitHub.
- Sign up for a free MindsDB account.
- Engage with the MindsDB community on Slack or GitHub to ask questions and share your ideas and thoughts.
If this tutorial was helpful, please give us a GitHub star here.