AI for DBAs: Understanding AI Vectors

Imagine you’re building a robot librarian. Instead of organizing books by titles or authors, it groups them by meaning—romance novels near poetry, tech manuals near science fiction. This is exactly what AI vector databases do: they turn messy data (words, images, sounds) into meaningful numbers that machines can understand. Let’s break it down.

1. AI Vectors: Turning Chaos Into Numbers

A famous example of vector arithmetic in AI is: king – man + woman ≈ queen.
Computers don’t understand words like “King” or images of cats. But they understand with numbers.

An AI vector is like a GPS coordinate for data:

  • Word example:
    • King → [0.5, 0.7, 0.3, ...]
    • Queen → [0.3, 0.9, 0.28, ...]
  • Image example: A cat photo → [0.6, 0.3, 0.75, ...]

These numbers aren’t random. They capture relationships. For instance:

The result [0.3, 0.9] is very close to the vector for “queen.” This shows how AI can find relationships between concepts using simple math. This math lets AI infer that “queen” is to “woman” as “king” is to “man.”

2. Embeddings: The Magic Behind the Numbers

Creating these vectors is called embedding. Tools like Google’s Word2Vec or OpenAI’s ADA-002 convert data into numbers while preserving meaning.

Example: The word “King” looks different across models:

ModelVector (First 5 Numbers)
Google Word2Vec[0.53, 0.12, -0.31, 0.29, ...]
OpenAI ADA-002[-0.12, 0.04, 0.67, -0.25, ...]

Each AI model creates unique vectors, while it is possible to train custom embeddings, doing so requires significant computational resources. Instead, many AI applications use pre-trained embeddings, such as Google’s Word2Vec, which has been trained on 100 billion words, so you don’t need to build your own.

3. Using Word2Vec in Python

Below is a Python script to obtain the vector representation for “King” using Word2Vec:

from gensim.models import KeyedVectors

# Load Google's pretrained Word2Vec model (300 dimensions)
word2vec_path = "GoogleNews-vectors-negative300.bin"  # Download required
model = KeyedVectors.load_word2vec_format(word2vec_path, binary=True)

# Get the vector for "King"
king_vector = model["king"]
print(king_vector)

Example output:

[ 1.25976562e-01  2.97851562e-02  8.60595703e-03  1.39648438e-01
 -2.56347656e-02 -3.61328125e-02  1.11816406e-01 -1.98242188e-01
  5.12695312e-02  3.63281250e-01 -2.42187500e-01 -3.02734375e-01 ...]

(Download the model GoogleNews-vectors-negative300.bin  here.)

4. Why Vector Databases? When “Good Enough” Isn’t Enough

Storing vectors in a traditional database is like organizing a library by tossing all the books into a single, unstructured pile. Sure, you could find a book eventually, but it would take forever. Vector databases act as the librarians of the AI world—designed to efficiently store, index, and retrieve vectors at high speed, even when dealing with millions of them.

A vector database is any database that natively stores and manages vector embeddings, making it possible to handle unstructured data like documents, images, videos, and audio. These databases excel at fast similarity searches, enabling AI-powered applications to find relevant information quickly. Oracle 23 AI Vector is one such solution, but other vector databases like Pinecone, FAISS, and others also serve this purpose.

Here is an example of creating a vector table in Oracle 23ai.

CREATE TABLE image_vectors (
    id NUMBER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    image_name VARCHAR2(255),
    embedding VECTOR(500)  -- Stores 500-dimensional vector 
);


Learn More

For a deeper understanding of Word2Vec and AI embeddings, check out this video: Word Embedding and Word2Vec, Clearly Explained

Published by dbaliw

Highly experienced Oracle Database Administrator and Exadata Specialist with over 15 years of expertise in managing complex database environments. Skilled in cloud technologies, DevOps practices, and automation. Certified Oracle Cloud Infrastructure Architect and Oracle Certified Master with a strong background in performance tuning, high availability solutions, and database migrations.

Leave a comment