pinecone
Store and retrieve vectors from Pinecone vector database.
The Pinecone module provides functions for vector operations including upsert, query, fetch, and delete. This enables semantic search, similarity matching, and RAG (Retrieval Augmented Generation) capabilities in workflows.
Overview
Pinecone is a managed vector database optimized for similarity search. The module automatically:
- Organizes vectors by organization/environment namespace
- Handles authentication and API communication
- Supports both entity argument and environment variable configuration
Namespace Format:
{namespacePrefix}/{organizationId}/{environmentId}/{customNamespace}
Configuration
Pinecone can be configured via entity arguments (recommended) or environment variables.
Entity Arguments (Preferred)
Set these as arguments on your workflow entity:
| Argument Name | Description |
|---|---|
pineconeApiKey | Your Pinecone API key |
pineconeIndexHost | Index host URL (e.g., my-index-abc123.svc.us-east1-gcp.pinecone.io) |
pineconeNamespacePrefix | Optional prefix for namespaces |
Example Entity Configuration:
{
"arguments": [
{
"argumentName": "pineconeApiKey",
"argumentValue": "{{PINECONE_API_KEY}}",
"argumentDescription": "Pinecone API key from environment"
},
{
"argumentName": "pineconeIndexHost",
"argumentValue": "{{PINECONE_INDEX_HOST}}",
"argumentDescription": "Pinecone index host"
}
]
}
Environment Variables (Fallback)
| Variable | Description |
|---|---|
PINECONE_API_KEY | Your Pinecone API key |
PINECONE_INDEX_HOST | Index host URL |
PINECONE_NAMESPACE_PREFIX | Optional namespace prefix |
Entity arguments take precedence over environment variables.
pineconeUpsert
Insert or update vectors in Pinecone.
Signature
await pineconeUpsert(vectors, namespace)
Description
Upserts one or more vectors to Pinecone. Each vector must have an id, values (the embedding), and optional metadata.
Parameters
| Parameter | Type | Description |
|---|---|---|
vectors | object[] | Array of vector objects |
namespace | string | Optional custom namespace (appended to org/env) |
Vector Object Structure:
{
id: string; // Unique identifier
values: number[]; // Embedding vector (float array)
metadata?: object; // Optional metadata for filtering
}
Returns
Promise<{upsertedCount: number}> — Number of vectors upserted.
Example
// Generate embedding for content
const embedding = await createEmbedding(message.content);
// Upsert single vector
await pineconeUpsert([{
id: `event-${message.eventId}`,
values: embedding,
metadata: {
type: message.type,
timestamp: Date.now(),
content: message.content.substring(0, 500) // Store preview
}
}]);
// Upsert multiple vectors
const vectors = messages.map(msg => ({
id: `msg-${msg.id}`,
values: msg.embedding,
metadata: { source: msg.source }
}));
await pineconeUpsert(vectors, 'messages');
pineconeQuery
Query Pinecone for similar vectors.
Signature
await pineconeQuery(vector, topK, namespace, filter)
Description
Finds vectors most similar to the query vector using cosine similarity. Returns scored matches with metadata.
Parameters
| Parameter | Type | Description |
|---|---|---|
vector | number[] | Query embedding vector |
topK | number | Number of results to return (default: 10) |
namespace | string | Optional custom namespace (appended to org/env) |
filter | object | Optional metadata filter conditions |
Returns
Promise<{matches: Array<{id: string, score: number, metadata?: object}>}> — Scored matches.
Example
// Simple similarity search
const queryEmbedding = await createEmbedding('touchdown celebration');
const results = await pineconeQuery(queryEmbedding, 5);
for (const match of results.matches) {
print(`${match.id}: ${match.score.toFixed(3)}`);
print(` Content: ${match.metadata?.content}`);
}
// Filtered search
const filtered = await pineconeQuery(queryEmbedding, 10, '', {
type: { $eq: 'touchdown' },
timestamp: { $gt: Date.now() - 86400000 } // Last 24 hours
});
// Search specific namespace
const archived = await pineconeQuery(queryEmbedding, 3, 'archive-2024');
pineconeFetch
Fetch vectors by their IDs.
Signature
await pineconeFetch(ids, namespace)
Description
Retrieves specific vectors by their IDs. More efficient than querying when you know the exact IDs.
Parameters
| Parameter | Type | Description |
|---|---|---|
ids | string[] | Array of vector IDs to fetch |
namespace | string | Optional custom namespace |
Returns
Promise<{vectors: Record<string, {id: string, values: number[], metadata?: object}>}> — Map of ID to vector data.
Example
// Fetch specific vectors
const result = await pineconeFetch([
'event-12345',
'event-12346',
'event-12347'
]);
for (const [id, vector] of Object.entries(result.vectors)) {
print(`${id}: ${vector.metadata?.content}`);
}
// Check if vector exists
const existing = await pineconeFetch([`event-${message.eventId}`]);
if (Object.keys(existing.vectors).length > 0) {
print('Vector already exists');
}
pineconeDelete
Delete vectors from Pinecone by ID.
Signature
await pineconeDelete(ids, namespace)
Description
Deletes specific vectors by their IDs.
Parameters
| Parameter | Type | Description |
|---|---|---|
ids | string[] | Array of vector IDs to delete |
namespace | string | Optional custom namespace |
Returns
Promise<{}> — Empty object on success.
Example
// Delete specific vectors
await pineconeDelete(['event-12345', 'event-12346']);
// Delete from specific namespace
await pineconeDelete(['event-99999'], 'temp-data');
Use Cases
1. Semantic Search for Context
Find relevant historical content for prompt enrichment:
// Generate embedding for current event
const queryEmbedding = await createEmbedding(
`${message.payload.event.description} ${message.payload.event.play_type}`
);
// Find similar past events
const similar = await pineconeQuery(queryEmbedding, { topK: 3 });
// Build context for prompt
let context = 'Similar past events:\n';
for (const match of similar.matches) {
context += `- ${match.metadata?.content}\n`;
}
// Use context in prompt
var prompt = `
Based on these similar past events:
${context}
Generate a social media post for: ${message.payload.event.description}
`;
2. Deduplication with Similarity
Prevent near-duplicate content:
const embedding = await createEmbedding(message.content);
// Check for similar existing content
const similar = await pineconeQuery(embedding, { topK: 1 });
if (similar.matches.length > 0 && similar.matches[0].score > 0.95) {
print('Content too similar to existing:', similar.matches[0].id);
return; // Skip processing
}
// Content is unique enough, proceed
await pineconeUpsert([{
id: `content-${message.id}`,
values: embedding,
metadata: { content: message.content }
}]);
3. RAG (Retrieval Augmented Generation)
Enhance AI prompts with relevant documentation:
// User's question
const question = message.payload.question;
const questionEmbedding = await createEmbedding(question);
// Search documentation
const docs = await pineconeQuery(questionEmbedding, {
topK: 5,
namespace: 'documentation'
});
// Build RAG context
let context = '';
for (const doc of docs.matches) {
context += `${doc.metadata?.title}:\n${doc.metadata?.content}\n\n`;
}
// Generate answer with context
var prompt = `
Answer the following question using only the provided context.
Context:
${context}
Question: ${question}
Answer:
`;
await executePromptWithModel();
4. Content Indexing Pipeline
Index new content as it arrives:
// Only index certain event types
if (!['touchdown', 'field_goal', 'interception'].includes(message.payload.event.play_type)) {
return;
}
// Generate embedding
const content = `${message.payload.event.description}. ${message.payload.event.player?.name || ''} ${message.payload.event.team?.name || ''}`;
const embedding = await createEmbedding(content);
// Upsert to Pinecone
await pineconeUpsert([{
id: `event-${message.payload.event.id}`,
values: embedding,
metadata: {
playType: message.payload.event.play_type,
gameId: message.payload.game.id,
timestamp: message.payload.timestamp,
content: content
}
}], 'events');
print('Indexed event:', message.payload.event.id);
Filter Syntax
Pinecone supports metadata filtering with these operators:
| Operator | Description | Example |
|---|---|---|
$eq | Equal | { type: { $eq: 'touchdown' } } |
$ne | Not equal | { type: { $ne: 'timeout' } } |
$gt | Greater than | { score: { $gt: 7 } } |
$gte | Greater than or equal | { quarter: { $gte: 3 } } |
$lt | Less than | { timestamp: { $lt: 1234567890 } } |
$lte | Less than or equal | { quarter: { $lte: 2 } } |
$in | In array | { type: { $in: ['touchdown', 'field_goal'] } } |
$nin | Not in array | { type: { $nin: ['timeout', 'penalty'] } } |
Combining Filters:
const results = await pineconeQuery(embedding, {
topK: 10,
filter: {
$and: [
{ type: { $eq: 'touchdown' } },
{ quarter: { $gte: 3 } },
{ team: { $in: ['Chiefs', 'Eagles'] } }
]
}
});
Best Practices
Embedding Dimensions
Ensure your embeddings match the Pinecone index dimension:
- Amazon Titan
text-embedding-v2: 1024 dimensions - OpenAI
text-embedding-3-small: 1536 dimensions - OpenAI
text-embedding-ada-002: 1536 dimensions
Metadata Guidelines
- Store searchable fields — Fields you'll filter on
- Include content previews — For debugging and display
- Add timestamps — For time-based queries
- Keep metadata small — Pinecone has metadata size limits
Namespace Organization
{prefix}/
{org}/{env}/
events/ # Event embeddings
documentation/ # RAG documents
responses/ # Generated content
Error Handling
try {
await pineconeUpsert(vectors);
} catch (error) {
print('Pinecone error:', error.message);
// Common errors:
// - "Pinecone API key and index host are required"
// - "message.organizationId and message.environmentId are required"
// - HTTP errors (401 unauthorized, 400 bad request)
}
Related Topics
- prompt - createEmbedding — Generate embeddings
- STM — Short-term memory storage
- Consumer Evaluator — Workflow execution