"Unlocking Success: Fine-Tuning Your Embeddings and Staying Scrappy"


Hatched by Glasp

Aug 26, 2023

5 min read


"Unlocking Success: Fine-Tuning Your Embeddings and Staying Scrappy"


In today's article, we will explore two distinct yet interconnected topics that can greatly contribute to success in different domains. We will delve into the concept of fine-tuning embeddings for better similarity search and how it can benefit the labeling workflow. Additionally, we will discuss the significance of staying scrappy in entrepreneurial endeavors and how it can drive innovation and achievement. Although these topics may seem unrelated at first glance, we will uncover common threads and provide actionable advice for readers to implement in their own journeys.

Fine-Tuning Your Embeddings for Better Similarity Search:

To begin, let's explore the notion of fine-tuning embeddings and its relevance in improving similarity search. Embeddings are representations of data that capture its semantic meaning in a numerical format. By leveraging embeddings, we can enhance the labeling process and enable similarity search, allowing users to find records similar to a selected one based on cosine similarity of their embeddings. The goal of fine-tuning embeddings is to ensure that more records of the same class are present within a similarity labeling session. This process involves adjusting language models to better fit the specific domain of the data being analyzed. While it is advisable to check existing fine-tuned models in databases, fine-tuning can be a valuable approach for domain-specific expertise.

Connecting Fine-Tuning and Similarity Learning:

Before diving into the experiment, it is crucial to understand the concept of similarity learning. Similarity, in the context of this discussion, is defined by the class labels assigned to the records. Therefore, two records are considered similar if they possess the same class label, and dissimilar if their class labels differ. In a study conducted, a sample of 20,000 records was selected, out of which 261 were manually labeled for the experiment. By filtering for a confidence score greater than 0.7, the researchers obtained 10,854 usable records for the fine-tuning pipeline. Various methods, such as similarity scores, pre-formed triplets, and similarity groups based on class, can be employed to fine-tune the embeddings. Since the class information is the primary similarity measure available, SimilarityGroupSamples were utilized.

The Role of Pre-Trained LLM and SkipConnectionHead:

To facilitate the mapping of one embedding to another, a pre-trained Large Language Model (LLM) was employed as the encoder. A SkipConnectionHead was added on top of the encoder to optimize the fine-tuning process. This approach, which employs skip connections instead of just a linear layer, has been found to be more effective. Traditionally, classification heads with a number of out-features corresponding to the number of classes would be used. However, measuring the effectiveness of embeddings can be challenging. To address this, a metric called the "top_1k" metric was introduced, which measures the increase in records of the same class within the 1000 most similar records. This metric allows for an assessment of the number of records that need to be labeled for the fine-tuning process to be beneficial.

Benefits of Fine-Tuning and Future Considerations:

The experiment revealed that even with a modest number of 25 records, the fine-tuned embeddings exhibited superior performance compared to raw embeddings. This finding is particularly promising, as it suggests that fine-tuning can improve labeling sessions even with limited data. Furthermore, fine-tuned embeddings with class information can potentially benefit classifiers trained on the same data. It is worth noting that basic Principal Component Analysis (PCA) often fails to separate embeddings effectively in just two dimensions, making the annotation process challenging. Consequently, efforts are underway to develop methods that refine embeddings to achieve better class separation in a 2D space.

Staying Scrappy: A Philosophy for Success:

Now, let's shift our focus to the concept of staying scrappy as a vital component of achieving success, especially for entrepreneurs and teams embarking on new ventures. Scrappiness is characterized by adaptability, a focus on impact, determination, and the willingness to fight for one's goals. It involves facing challenges head-on, being resourceful, and pushing through obstacles to make things happen. Staying scrappy allows individuals and teams to stay focused on their vision, identify opportunities, and navigate setbacks effectively.

The Importance of Iteration and Testing:

In the pursuit of building something new, the process of iteration and testing plays a pivotal role. Staying scrappy involves embracing a mindset of continuous experimentation, often referred to as "scrappy A/B testing." This approach entails finding a product-market fit through constant iteration, learning, and adaptation. By avoiding bureaucracy and maintaining agility, entrepreneurs can move swiftly in a competitive marketplace, enabling them to discover the right ground for their ideas to flourish.

Balancing Learning from Failure and Taking Risks:

While learning from past mistakes is essential, it is crucial not to be paralyzed by them. Striking a balance between leveraging past failures as valuable lessons and fearlessly exploring new possibilities is key. The fear of failure can hinder innovation, and it is important to embrace the idea that failure is a natural part of the path to success. Scrappiness entails celebrating small successes, using mistakes as learning opportunities, and constantly adapting and refining one's approach.

Fostering a Culture of Scrappiness:

To truly harness the power of scrappiness, organizations should cultivate a culture that encourages teams to test, learn, iterate, and take calculated risks. By embracing a mindset of embracing challenges, celebrating progress, and viewing mistakes as stepping stones rather than roadblocks, companies can foster innovation and achieve remarkable outcomes. Leaders play a crucial role in empowering their teams to be scrappy, ensuring that bureaucracy does not hinder progress, and providing the necessary support and resources to drive impact.

Actionable Advice:

  • 1. Embrace continuous experimentation: Incorporate scrappy A/B testing into your innovation process to iterate, learn, and adapt quickly.
  • 2. Encourage a culture of learning from failure: Create an environment where mistakes are seen as opportunities for growth and improvement.
  • 3. Foster agility and adaptability: Prioritize staying scrappy by promoting adaptability, resourcefulness, and a determination to overcome obstacles.


In conclusion, both fine-tuning embeddings for better similarity search and staying scrappy in entrepreneurial endeavors are crucial elements for success. Fine-tuning embeddings can enhance similarity labeling sessions and improve the performance of classifiers. On the other hand, staying scrappy empowers individuals and teams to navigate challenges, identify opportunities, and drive innovation. By incorporating the actionable advice provided, readers can unlock their potential, overcome obstacles, and achieve their goals.

Hatch New Ideas with Glasp AI 🐣

Glasp AI allows you to hatch new ideas based on your curated content. Let's curate and create with Glasp AI :)