Posts

Understanding Vector Normalization

Vectors are fundamental entities used to represent quantities that have both direction and magnitude. Whether you’re working in machine learning, physics, or computer graphics, vectors have a crucial role. However, the raw magnitude of a vector is not always useful in certain computations. This is where vector normalization comes into play. Vector normalization is a process that scales a vector so that it has a unit length of 1 but retains its direction. This operation is particularly useful in various applications, including machine learning, physics simulations, and computer graphics. ...

What is a Lamport Clock?

When data is stored across multiple servers in a distributed system, it is crucial to determine the order in which operations occurred to maintain consistency and ensure the system behaves correctly. But why can’t we rely on the system timestamps? Non-monotonicity of System Clocks: System clocks are not guaranteed to be strictly increasing over time (monotonic). For instance, when servers synchronize their time using the Network Time Protocol (NTP), the clock may be adjusted backward if it was ahead of the actual time. This backward adjustment can create confusion in determining the true order of events, as a later operation might appear to have occurred before an earlier one. Crystal Oscillator Drift: System timestamps are generated based on the server’s internal clock, which relies on a crystal oscillator. Over time, this oscillator can drift, causing the server’s clock to become slightly inaccurate. To correct this drift, NTP is used, but this correction process can cause time to “jump” backward or forward, further complicating event ordering. Incomparable Clocks Across Servers: Even if each server had a perfectly accurate clock, the timestamps from different servers cannot be directly compared. Each server’s clock might be slightly ahead or behind others, leading to inconsistent time comparisons across the system. Lamport Clocks to the Rescue To address these challenges, Lamport Clocks are used. Lamport Clocks provide a way to assign a logical timestamp to events in a distributed system, ensuring a consistent order of events. ...

Update payload for multiple points in Qdrant

Qdrant vector DB supports updating multiple points using batch_update_points(). Batch update points can take in multiple different operations as parameters and run them as a single API operation. batch_update_points() supports 4 kinds of update operations. UpsertOperation DeleteOperation UpdateVectorsOperation DeleteVectorsOperation SetPayloadOperation OverwritePayload DeletePayloadOperation ClearPayloadOperation batch_update_points() supports mixing up different kinds of operations in one request. For example you could perform an UpsertOperation for point ids [1, 2, 3] along with a delete operation on point id [4, 5, 6]. ...

Buffered vs Unbuffered I/O

As applications demand more from databases and disk performance struggles to keep up, the way we handle I/O operations becomes crucial. Buffered I/O is like the memory foam of database operations. Instead of writing directly to the disk, data is first written to a buffer in memory. The operating system then decides when to flush this data to the physical disk. This approach makes the disk appear faster because the OS can batch operations, reducing the frequency of disk writes. It’s like a smooth operator, managing spikes in I/O demand and keeping things running consistently. ...

A Practical Guide to Pyenv and Shims for Python Developers

Managing multiple versions of Python can be tricky, especially when juggling different projects requiring their own specific environments. That’s where Pyenv comes in. It’s a handy tool that makes switching between Python versions effortless. In this article, we’ll explore how Pyenv works, the concept of shims, and why these features make Pyenv so useful. What Pyenv Offers Pyenv is a versatile tool that provides several key features to help manage your Python environment: ...

Building Timble: A vector content recommendation engine

As content becomes more and more freely available, it has become kind of a full time job to identify what to watch next. The streaming services have their own recommendation algorithms but they work in silos. In this article we are going to build a bare bones recommendation system using Qdrant vector search. We will be using semantic similarity to find movie and tv show recommendations. Disclaimer: I am no expert in AI, recommendation systems or vector databases. This post is an experiment to move towards that point in this high dimensional vector space. ...

Mixins in Python

A mixin is a class that provides some functionality that can be easily incorporated in other classes. Mixins usually provide some standalone functionality that can be reused in many different classes. Python supports Multiple inheritance. Multiple inheritance means that a class can inherit from multiple parents. Multiple inheritance also implies that the order of parent classes becomes important. Python has the concept of MRO. MRO or Method Resolution Order is the order in which Python looks for a method in a hierarchy of classes. The order goes from left to right, which means the right most can be considered as the base(-est) class. The methods in the classes on the right will get over written by the methods in the classes on the left. ...

Tensor Operations: Zero to Hero

A Tensor is a container that can hold an N dimensional data structure. Neural Networks love numbers. In fact that’s all they understand. GPUs are great at handling numbers. And they can operate on many numbers in parallel. Therefore a key idea in machine learning is to group numbers together and create a Tensor that can be handed over to the GPU. Arrays and tensors An array is a one dimensional data structure and a tensor that has a single dimension is called a rank 1 tensor. A matrix is a two dimensional data structure and a tensor that has two dimensions is called a rank 2 tensor. A stack of matrices can be thought of as a three dimensional data structure and a tensor that three dimensions is called a rank 3 tensor. Enough text, let’s look at some code. ...

Langchain 101

Langchain is the probably the easiest way for building LLM based applications. According to Andrej Karpathy LLMs are like Operating systems that allow developers to build apps using their broad ranging capabilities. If we build on that analogy Langchain would be analogous to a framework as .Net or Django or Express. As per the Langchain’s State of AI 2023 report: 42% of LLM applications involve some kind of retrieval system 17% involve an agentic system. There is a huge push towards agentic systems from a lot of gaints of AI, including people like Andrew Ng and Andrej Karpathy. ...

How I Am Learning AI

I have been a software engineer for the past 12 years now. I recently caught the AI bug and have decided to go all in on AI. While the thought of making this shift is intimidating at times, it has been a few years in the coming. At this point I am more okay with learning AI and falling flat on my face than continuing to wonder how wonderful it would be if I somehow became an expert magically. ...