This is the implementation of the Pinecone for MindsDB.

Pinecone is a vector database which is fully-managed, developer-friendly, and easily scalable.

Prerequisites

Before proceeding, ensure the following prerequisites are met:

  1. Install MindsDB locally via Docker or use MindsDB Cloud.
  2. To connect Pinecone to MindsDB, install the required dependencies following this instruction.
  3. Install or ensure access to Pinecone.

Implementation

This handler uses pinecone-client python library connect to a pinecone environment.

The required arguments to establish a connection are:

  • api_key: the API key that can be found in your pinecone account
  • environment: the environment name corresponding to the api_key

These optional arguments are used with CREATE statements:

  • dimension: dimensions of the vectors to be stored in the index (default=8)
  • metric: distance metric to be used for similarity search (default=‘cosine’)
  • pods: number of pods for the index to use, including replicas (default=1)
  • replicas: the number of replicas. replicas duplicate your index. they provide higher availability and throughput (default=1)
  • pod_type: the type of pod to use, refer to pinecone documentation (default=‘p1’)

Limitations

  • DROP TABLE support
  • Support for namespaces
  • Display score/distance
  • Support for creating/reading sparse values
  • content column is not supported since it does not exist in Pinecone

Usage

In order to make use of this handler and connect to an environment, use the following syntax:

CREATE DATABASE pinecone_dev
WITH ENGINE = "pinecone",
PARAMETERS = {
   "api_key": "...",
   "environment": "..."
};

You can query pinecone indexes (temp in the following examples) based on id or search_vector, but not both:

SELECT * from pinecone_dev.temp
WHERE id = "abc"
LIMIT 1
SELECT * from pinecone_dev.temp
WHERE search_vector = "[1,2,3,4,5,6,7,8]"

If you are using subqueries, make sure that the result is only a single row since the use of multiple search vectors is not allowed

SELECT * from pinecone_database.temp
WHERE search_vector = (
    SELECT embeddings FROM sqlitetesterdb.test WHERE id = 10
)

Optionally, you can filter based on metadata too:

SELECT * from pinecone_dev.temp
WHERE id = "abc" AND temp.metadata.hello < 100

You can delete records using id or metadata like so:

DELETE FROM pinecone_dev.temp
WHERE id = "abc"

Note that deletion through metadata is not supported in starter tier

DELETE FROM pinecone_dev.temp
WHERE temp.metadata.tbd = true

You can insert data into a new collection like so:

CREATE TABLE pinecone_dev.temp (
SELECT * FROM mysql_demo_db.temp LIMIT 10);

To update records, you can use insert statement. When there is a conflicting ID in pinecone index, the record is updated with new values. It might take a bit to see it reflected.

INSERT INTO pinecone_test.testtable (id,content,metadata,embeddings)
VALUES (
    'id1', 'this is a test', '{"test": "test"}', '[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]'
);