Skip to main content

pinecone

Store and retrieve vectors from Pinecone vector database.

The Pinecone module provides functions for vector operations including upsert, query, fetch, and delete. This enables semantic search, similarity matching, and RAG (Retrieval Augmented Generation) capabilities in workflows.


Overview

Pinecone is a managed vector database optimized for similarity search. The module automatically:

  • Organizes vectors by organization/environment namespace
  • Handles authentication and API communication
  • Supports both entity argument and environment variable configuration

Namespace Format:

{namespacePrefix}/{organizationId}/{environmentId}/{customNamespace}

Configuration

Pinecone can be configured via entity arguments (recommended) or environment variables.

Entity Arguments (Preferred)

Set these as arguments on your workflow entity:

Argument NameDescription
pineconeApiKeyYour Pinecone API key
pineconeIndexHostIndex host URL (e.g., my-index-abc123.svc.us-east1-gcp.pinecone.io)
pineconeNamespacePrefixOptional prefix for namespaces

Example Entity Configuration:

{
"arguments": [
{
"argumentName": "pineconeApiKey",
"argumentValue": "{{PINECONE_API_KEY}}",
"argumentDescription": "Pinecone API key from environment"
},
{
"argumentName": "pineconeIndexHost",
"argumentValue": "{{PINECONE_INDEX_HOST}}",
"argumentDescription": "Pinecone index host"
}
]
}

Environment Variables (Fallback)

VariableDescription
PINECONE_API_KEYYour Pinecone API key
PINECONE_INDEX_HOSTIndex host URL
PINECONE_NAMESPACE_PREFIXOptional namespace prefix

Entity arguments take precedence over environment variables.


pineconeUpsert

Insert or update vectors in Pinecone.

Signature

await pineconeUpsert(vectors, namespace)

Description

Upserts one or more vectors to Pinecone. Each vector must have an id, values (the embedding), and optional metadata.

Parameters

ParameterTypeDescription
vectorsobject[]Array of vector objects
namespacestringOptional custom namespace (appended to org/env)

Vector Object Structure:

{
id: string; // Unique identifier
values: number[]; // Embedding vector (float array)
metadata?: object; // Optional metadata for filtering
}

Returns

Promise<{upsertedCount: number}> — Number of vectors upserted.

Example

// Generate embedding for content
const embedding = await createEmbedding(message.content);

// Upsert single vector
await pineconeUpsert([{
id: `event-${message.eventId}`,
values: embedding,
metadata: {
type: message.type,
timestamp: Date.now(),
content: message.content.substring(0, 500) // Store preview
}
}]);

// Upsert multiple vectors
const vectors = messages.map(msg => ({
id: `msg-${msg.id}`,
values: msg.embedding,
metadata: { source: msg.source }
}));
await pineconeUpsert(vectors, 'messages');

pineconeQuery

Query Pinecone for similar vectors.

Signature

await pineconeQuery(vector, topK, namespace, filter)

Description

Finds vectors most similar to the query vector using cosine similarity. Returns scored matches with metadata.

Parameters

ParameterTypeDescription
vectornumber[]Query embedding vector
topKnumberNumber of results to return (default: 10)
namespacestringOptional custom namespace (appended to org/env)
filterobjectOptional metadata filter conditions

Returns

Promise<{matches: Array<{id: string, score: number, metadata?: object}>}> — Scored matches.

Example

// Simple similarity search
const queryEmbedding = await createEmbedding('touchdown celebration');
const results = await pineconeQuery(queryEmbedding, 5);

for (const match of results.matches) {
print(`${match.id}: ${match.score.toFixed(3)}`);
print(` Content: ${match.metadata?.content}`);
}

// Filtered search
const filtered = await pineconeQuery(queryEmbedding, 10, '', {
type: { $eq: 'touchdown' },
timestamp: { $gt: Date.now() - 86400000 } // Last 24 hours
});

// Search specific namespace
const archived = await pineconeQuery(queryEmbedding, 3, 'archive-2024');

pineconeFetch

Fetch vectors by their IDs.

Signature

await pineconeFetch(ids, namespace)

Description

Retrieves specific vectors by their IDs. More efficient than querying when you know the exact IDs.

Parameters

ParameterTypeDescription
idsstring[]Array of vector IDs to fetch
namespacestringOptional custom namespace

Returns

Promise<{vectors: Record<string, {id: string, values: number[], metadata?: object}>}> — Map of ID to vector data.

Example

// Fetch specific vectors
const result = await pineconeFetch([
'event-12345',
'event-12346',
'event-12347'
]);

for (const [id, vector] of Object.entries(result.vectors)) {
print(`${id}: ${vector.metadata?.content}`);
}

// Check if vector exists
const existing = await pineconeFetch([`event-${message.eventId}`]);
if (Object.keys(existing.vectors).length > 0) {
print('Vector already exists');
}

pineconeDelete

Delete vectors from Pinecone by ID.

Signature

await pineconeDelete(ids, namespace)

Description

Deletes specific vectors by their IDs.

Parameters

ParameterTypeDescription
idsstring[]Array of vector IDs to delete
namespacestringOptional custom namespace

Returns

Promise<{}> — Empty object on success.

Example

// Delete specific vectors
await pineconeDelete(['event-12345', 'event-12346']);

// Delete from specific namespace
await pineconeDelete(['event-99999'], 'temp-data');

Use Cases

1. Semantic Search for Context

Find relevant historical content for prompt enrichment:

// Generate embedding for current event
const queryEmbedding = await createEmbedding(
`${message.payload.event.description} ${message.payload.event.play_type}`
);

// Find similar past events
const similar = await pineconeQuery(queryEmbedding, { topK: 3 });

// Build context for prompt
let context = 'Similar past events:\n';
for (const match of similar.matches) {
context += `- ${match.metadata?.content}\n`;
}

// Use context in prompt
var prompt = `
Based on these similar past events:
${context}

Generate a social media post for: ${message.payload.event.description}
`;

2. Deduplication with Similarity

Prevent near-duplicate content:

const embedding = await createEmbedding(message.content);

// Check for similar existing content
const similar = await pineconeQuery(embedding, { topK: 1 });

if (similar.matches.length > 0 && similar.matches[0].score > 0.95) {
print('Content too similar to existing:', similar.matches[0].id);
return; // Skip processing
}

// Content is unique enough, proceed
await pineconeUpsert([{
id: `content-${message.id}`,
values: embedding,
metadata: { content: message.content }
}]);

3. RAG (Retrieval Augmented Generation)

Enhance AI prompts with relevant documentation:

// User's question
const question = message.payload.question;
const questionEmbedding = await createEmbedding(question);

// Search documentation
const docs = await pineconeQuery(questionEmbedding, {
topK: 5,
namespace: 'documentation'
});

// Build RAG context
let context = '';
for (const doc of docs.matches) {
context += `${doc.metadata?.title}:\n${doc.metadata?.content}\n\n`;
}

// Generate answer with context
var prompt = `
Answer the following question using only the provided context.

Context:
${context}

Question: ${question}

Answer:
`;

await executePromptWithModel();

4. Content Indexing Pipeline

Index new content as it arrives:

// Only index certain event types
if (!['touchdown', 'field_goal', 'interception'].includes(message.payload.event.play_type)) {
return;
}

// Generate embedding
const content = `${message.payload.event.description}. ${message.payload.event.player?.name || ''} ${message.payload.event.team?.name || ''}`;
const embedding = await createEmbedding(content);

// Upsert to Pinecone
await pineconeUpsert([{
id: `event-${message.payload.event.id}`,
values: embedding,
metadata: {
playType: message.payload.event.play_type,
gameId: message.payload.game.id,
timestamp: message.payload.timestamp,
content: content
}
}], 'events');

print('Indexed event:', message.payload.event.id);

Filter Syntax

Pinecone supports metadata filtering with these operators:

OperatorDescriptionExample
$eqEqual{ type: { $eq: 'touchdown' } }
$neNot equal{ type: { $ne: 'timeout' } }
$gtGreater than{ score: { $gt: 7 } }
$gteGreater than or equal{ quarter: { $gte: 3 } }
$ltLess than{ timestamp: { $lt: 1234567890 } }
$lteLess than or equal{ quarter: { $lte: 2 } }
$inIn array{ type: { $in: ['touchdown', 'field_goal'] } }
$ninNot in array{ type: { $nin: ['timeout', 'penalty'] } }

Combining Filters:

const results = await pineconeQuery(embedding, {
topK: 10,
filter: {
$and: [
{ type: { $eq: 'touchdown' } },
{ quarter: { $gte: 3 } },
{ team: { $in: ['Chiefs', 'Eagles'] } }
]
}
});

Best Practices

Embedding Dimensions

Ensure your embeddings match the Pinecone index dimension:

  • Amazon Titan text-embedding-v2: 1024 dimensions
  • OpenAI text-embedding-3-small: 1536 dimensions
  • OpenAI text-embedding-ada-002: 1536 dimensions

Metadata Guidelines

  1. Store searchable fields — Fields you'll filter on
  2. Include content previews — For debugging and display
  3. Add timestamps — For time-based queries
  4. Keep metadata small — Pinecone has metadata size limits

Namespace Organization

{prefix}/
{org}/{env}/
events/ # Event embeddings
documentation/ # RAG documents
responses/ # Generated content

Error Handling

try {
await pineconeUpsert(vectors);
} catch (error) {
print('Pinecone error:', error.message);
// Common errors:
// - "Pinecone API key and index host are required"
// - "message.organizationId and message.environmentId are required"
// - HTTP errors (401 unauthorized, 400 bad request)
}