Skip to main content

Getting Started

What is Qdrant?

Qdrant is a vector database designed for high-performance vector similarity search. Unlike traditional relational databases, it stores and retrieves data based on vector embeddings, which are numerical representations that capture the semantic similarity between data points. Qdrant transforms embeddings and neural network encoders into robust applications for matching, searching, recommending, and powering RAG models. It is ideal for recommender systems, image/video search, and NLP applications.

Basic Terminology

  1. Points:
    The points are the central entity with which Qdrant operates. A point is a record consisting of a vector and an optional payload. You can search among the points grouped in one collection based on vector similarity.

  2. Vectors:
    A vector is a numerical representation of a data point in a high-dimensional space. Vectors represent the data points in Qdrant, allowing the system to perform similarity searches and other operations based on the geometric properties of the vectors. Vectors can be generated using techniques such as word embeddings, image embeddings, or other machine learning models that produce numerical representations of data points.

  3. Collection:
    A collection is a container for organizing and storing related vectors. It's similar to a table in a traditional relational database, where each row represents a vector, and each column represents a specific attribute or feature of that vector. Collections serve as the primary organizational unit within Qdrant, allowing users to group similar vectors for efficient storage, retrieval, and analysis.

  4. Nodes:
    Nodes refer to individual instances or units of the vector database system. Each node typically represents a server or computing resource. Nodes work together to store and manage the data vectors and handle various operations such as indexing, searching, and clustering. Nodes collaborate to distribute the workload and ensure high availability and scalability of the vector database. They communicate with each other to synchronize data and coordinate tasks, allowing the system to handle large volumes of data efficiently and serve queries from multiple clients concurrently.

  5. Shards:
    A Collection in Qdrant is made of one or more shards. A shard is an independent store of points that can perform all operations provided by collections. There are two methods of distributing points across shards:

    • Automatic Sharding:
      Points are distributed among shards by using a consistent hashing algorithm, so that shards are managing non-intersecting subsets of points. This is the default behavior. When you create a collection, Qdrant splits the collection into shard_number shards. If left unset, shard_number is set to the number of nodes in your cluster.

    • User-defined Sharding:
      Each point is uploaded to a specific shard so that operations can hit only the shards they need. Even with this distribution, shards still ensure that non-intersecting subsets of points are present. In this mode, the shard_number means the number of shards per shard key, where points will be distributed evenly.

    Shards are evenly distributed across all existing nodes when a collection is first created, but Qdrant does not automatically rebalance shards if your cluster size or replication factor changes (since this is an expensive operation on large clusters).

  6. Replication Factor:
    Qdrant allows you to replicate shards between nodes in the cluster. Shard replication increases the reliability of the cluster by keeping several copies of a shard spread across the cluster. This ensures the availability of the data in case of node failures, except if all replicas are lost.

  7. Index:
    A key feature of Qdrant is the effective combination of vector and traditional indexes. It is essential to have this because for vector search to work effectively with filters, more than just having a vector index is required. In simpler terms, a vector index speeds up vector search, and payload indexes speed up filtering.

What's Next?

Now that we understand the basics of Qdrant, let's create a Qdrant database. For a detailed guide on how to create a Qdrant database, head over to Creating New Qdrant.