Our Co-Pilot for Generative AI

Step into a world where the boundaries between human and machine creativity blur, and the realm of Generative AI takes center stage.

For decades, computers have operated as hyper-literal systems, but today marks a groundbreaking shift as they have cracked the foundation of human intelligence — common sense, abstract thinking, learning and adaptation. Over the past six months, the interest and excitement surrounding Generative AI have skyrocketed, transforming the landscape of software development. The speed at which innovation is occurring, fueled by the introduction of novel models and tools, is difficult to grasp, causing knowledge workers to make constant efforts to stay up-to-date with the latest advancements.

But what exactly is Generative AI? In simple terms, it refers to algorithms and models that have the ability to autonomously create content (realistic images, original music, coherent texts, and more), often based on patterns learned from vast amounts of data. It functions as the symphony conductor of AI, crafting masterpieces without human intervention.

The more time we have spent experimenting with these tools and testing the limits they can operate at, we find ourselves questioning the very essence of what it means to be human. Are these creations truly original or are they simply products of a machine’s programming? What does it mean, anyway, to be truly original?

At Speciale Invest, we’ve spent the past few months developing a Co-pilot for Generative AI — our own handy document to assist us in exploring, understanding and taking stock of the landscape of the ever-evolving Generative AI technologies and applications. In the spirit of fostering learning and collaboration, we have decided to share our GenAI Co-pilot and invite conversations, experiences and partnerships from the broader ecosystem to help us all navigate this space better.

As software has “eaten the world”, Generative AI is now “eating the software” itself, revolutionizing the way we create, interact with, and utilize software applications. We strongly believe that Generative AI is now a reality and understanding it is essential for industry professionals in all domains as entire workflows will be reimagined, creative processes revolutionized, whole tech stacks disrupted and how we interact with softwares around us poised to change forever.

To comprehend this technology trend, we think drawing parallels from the past and learning from history might be helpful.

Similar to how the cloud revolution enabled the consolidation of storage and computing infrastructure, allowing for the democratization of the technology at accessible costs for businesses of all sizes, we anticipate that Generative AI’s foundational models will consolidate internet-scale intelligence for a massive audience, and will be a fundamental technological shift to society.

Furthermore, Generative AI will only build over the cloud infrastructure, leveraging it toward resource allocation for all complex computations — to scale up training of large datasets as well as hosting of models or finetuning them, alongside enabling remote accessibility for the end users.

Thanks to the advent of cloud computing, advancements in GPUs, and open-source frameworks like PyTorch, Generative AI has witnessed remarkable progress. These technological advancements have provided an efficient and widely accessible foundation for training and inferring models, thus playing pivotal roles in enabling breakthroughs in the field. As a result, Generative AI techniques can now be applied across multiple modalities, ranging from NLP, Image, Video, Voice, to even the physical synthesis of Proteins, leading to a continuous stream of open-source and proprietary models.

We believe that the applications of GenAI can be broadly categorized into three distinct buckets, each demonstrating its unique capabilities:

Ask Questions against Knowledge: Ability to ask questions and generate relevant responses based on the knowledge it has acquired. This category focuses on leveraging generative AI to interact with users by posing questions and providing informative answers. Examples of this category include Chatbots, Virtual Assistants, Code Explanation tools, and more.
Insights from Knowledge: Extracting valuable insights from extensive datasets, enabling data-driven decisions and enhancing the understanding of complex information. This category encompasses Prediction Engines, Writing Assistants, Data Segmentation tools, and other applications that help derive meaningful patterns and insights from data.
New Data based on Knowledge: Ability to create new data based on the knowledge it has acquired. From generating text and music to artwork and gaming content, the possibilities are virtually limitless. This category includes applications such as AI-generated Gaming Content and Music Composition tools, which empower creators with new and diverse content possibilities.

AI is transforming the building blocks of these applications as well, changing the way we build software and creating what some refer to as “Software 2.0". In a previous blog, we discussed how the development of Foundation Models is the “iPhone moment” of software engineering. While the shift towards copilots and autopilots can already be seen, we now expect the AI software stack to evolve into 4 layers :

Infrastructure Layer (Hardware and Compute): Consists of the essential hardware and infrastructure components that enable the operation of modern AI-based software. This layer significantly influences the cost of development and running these models. High-performance GPUs and specialized AI chips are essential components within this layer, ensuring efficient computation for complex AI tasks.
Foundational Layer: At the core of the AI software stack lies the Foundational Layer, which comprises the large AI models that power the tools and act as the brain of modern applications. These models, often pre-trained on vast datasets, form the basis of various AI tasks, from natural language processing to image generation. The landscape of LLMs (Large Language Models) is constantly expanding, with a few leading models like OpenAI’s GPT4, DALL-E, Cohere, Anthropic’s Claude, Meta’s LLaMAI, StabilityAI, MosaicML and Inflection AI.
Enablement Layer : LLMs are different from traditional ML models, and given the high cost associated with their usage, it becomes crucial to develop effective management and deployment tools that optimize their performance and reduce costs. This Enablement Layer (Middleware & Tooling) bridges the gap between the underlying infrastructure and the application layer, providing a set of tools, frameworks, and services for seamless integration, communication, monitoring, and security.
Application Layer: This is where end-user solutions are built. While the other layers play a vital role in ensuring the operability of the software, the application layer generates the most value by providing practical solutions to users. This is the front-facing part of the AI software stack, where AI-powered products and services come to life.

In the following section, we talk about what we find specifically exciting about the above mentioned layers.

The Rise of Domain-Specific Foundational Models

While data is taking center stage and there is an abundance of foundation models, the proliferation of AI-first applications will fuel the development of numerous specialized domain or vertical-specific models. However, LLMs (Large Language Models) like ChatGPT and Bard have a drawback — they are highly generalized and can sometimes produce incorrect or fictional information, necessitating users to fact-check their outputs rigorously.

There is ongoing and exciting research attempting to tackle LLM Hallucinations Retrieval Augmented Generation (first laid out in this research paper) is a powerful technique that retrieves information from outside foundational models and augments prompts to the foundational models with contextual and relevant retrieved information.

A novel approach, more suited to highly specialized systems, is called the Small Language Model (SLM). SLMs leverage similar techniques as LLMs but with a crucial distinction — they are designed to focus on specific domains. By narrowing their scope, SLMs aim to provide more accurate and reliable results, reducing the need for extensive fact-checking.

During their initial stages, SLMs may utilize pre-trained logic from an LLM, either sourced from a vendor or an open source community. Subsequently, they refine and customize this logic to align with their intended domain, allowing for better governance and ensuring the model adheres closely to the domain’s requirements.

Think of this as a recent high school graduate using her base knowledge to now specialize with an engineering, law, medical or any specific study.

One significant advantage of SLMs lies in their energy and cost efficiency. Training smaller models consumes less energy and is more economical.

Additionally, SLMs optimize knowledge storage by transferring memorized information from model parameters into an external database. This approach reduces the number of necessary parameters and simplifies knowledge updates. Rather than retraining the model, new data can be fed into the models, and the resulting updated document embeddings can be stored in the database, enabling seamless knowledge expansion and refinement.

‍The Potential of Multimodal Foundational Models

Multimodality refers to the use of multiple modes of information, such as text, images, and audio, to improve understanding and analysis of data. It is the use of more than one mode of communication to create meaning. For example, video can be thought of as multimodal, because it requires stitching together audio, images and (likely) text.

As humans, we are constantly learning from data of all kinds and we believe that transferring this capability to foundational models will be the most natural evolution of AI. The concept of cross-modal understanding has the potential to enhance connections and enable more intelligent systems in various domains. Some of the potential applications include video-based Q&A, medical image diagnosis and more. Multimodality is currently an active area of research with promising implications for video understanding and other fields.

Vertical-AI SaaS :

The rapid growth of software applications in the past decade has raised the bar for businesses, who now demand customized functionality in the software they use. When software workflows are closely aligned with the tasks performed by end users, it results in higher user adoption, better retention rates, and reduced customer churn. Achieving this level of integration requires a deep understanding of the specific needs and challenges faced by customers and their industries.

With the AI Platform Shift, the next logical iteration is Vertical-AI, which involves AI platforms trained on industry-specific datasets.

We think there will be an emergence of new AI-native vertical applications and also the embedding of AI functionality into existing SaaS incumbents. The winners in the Vertical AI market will be those who can access proprietary industry data, effectively train LLMs, package those models into applications, and deliver value to customers. We also see demand for companies that can generate these proprietary structured data sets for other Vertical AI companies.

As mentioned earlier, the adoption of Small Language Models (that are vertically focused) opens up exciting possibilities for specialized and fine-tuned AI applications, ushering in a new era of domain-specific AI solutions that strike a balance between efficiency, accuracy, and interpretability.

Enablement Layer — Solving for LLMOps and Data Bottlenecks

Over the past few months, we have hosted a series of GenAI-focused meetups and talks and have had the opportunity to meet and learn from many founders, engineering and product leaders as well as data professionals building or experimenting with this new technology.

One common thread that came up repeatedly was that while GenAI holds significant promise in its capabilities, its applications in production are beset with many challenges.

When dealing with LLMs, developers often face several challenges, such as balancing the model’s performance, inference cost, and latency, while addressing the need for significant computational resources. Fine-tuning LLMs requires continuous effort and the establishment of reliable data pipelines for data collection, preprocessing, and annotation. LLMs are also prone to generating false information, which is known as “Hallucination”. On top of that, developers need to understand and explain how the model makes decisions and optimize its learning process. Furthermore, privacy and security concerns need to be addressed when deploying LLMs in production. However, achieving this optimization in production settings is far from easy.

To tackle these challenges, the enablement layer offers a suite of tools and middleware that streamline the entire LLM lifecycle from development to deployment, and maintenance of LLMs — involving various stages, from data acquisition and preprocessing to fine-tuning the model, deploying it into production, and continuously monitoring and updating it to ensure optimal performance.

The rise of LLMs demands that we rethink our MLOps tools and processes, and necessitates a more holistic approach towards building and deploying these models. The future of MLOps should be equipped to fully embrace the era of large foundation models. LLMOps is a set of techniques that allows for the efficient deployment, monitoring, and maintenance of LLMs that deliver the desired output to users by various techniques including Prompt Engineering, Model Training & Fine-tuning, and Model Deployment and Monitoring.

In an earlier blog, we shared our primer on the transition from MLOps to LLMOps.

Prompt engineering is a crucial step in deploying LLMs and involves designing specific instructions called prompts that guide the LLM in generating desired outputs. This process involves converting one or more tasks into a dataset based on prompts, which is then utilized to train a language model. Well-designed prompts are crucial to ensure clarity of intent, establish context, control output style, mitigate biases, and avoid harmful content when working with LLMs. The availability of LLMs through APIs and open source has revolutionized AI product development by simplifying the workflow and allowing users to directly interface with and prototype custom functionality without the need for extensive language knowledge or model parameter adjustments, opening up exciting possibilities for diverse applications and domains.

In addition to Prompt Engineering, there are Frameworks available such as LangChain and LlamaIndex. These frameworks enable developers to establish connections between deployed models and external APIs, as well as LLMs, to facilitate interactions with end-users. These frameworks provide the capability to break down complex tasks into smaller subtasks. Each subtask is then mapped to a single step for the model to complete, and the output from each step is used as input for the subsequent step. This approach allows for a systematic and efficient completion of the overall task.

LLMs can perform tasks like text generation, summarization, and coding, but they are not one-size-fits-all solutions. Fine-Tuning is a powerful technique that involves training a pre-trained model on a specific dataset to tailor it for a particular use case. By fine-tuning the weights and biases, the model can be optimized to better fit the target dataset and improve its performance on the specific task at hand. This process allows the model to learn from the new data and adjust its parameters to make more accurate predictions or generate more relevant outputs. There are various approaches to fine-tuning LLMs, including supervised & unsupervised methods and incorporating Reinforcement Learning from Human Feedback (RLHF) can further optimize LLM performance by training models with feedback from human evaluators, enabling continuous improvement over time.

In production systems, Observability plays a crucial role in monitoring, evaluating, optimizing, and troubleshooting. However, when it comes to LLMs, there are unique challenges due to their black box nature. This makes it difficult to assess their performance and understand the outputs they generate. To address these challenges, the development of new testing and comparison frameworks is necessary to establish standardized evaluation methods. Effective monitoring and logging can help identify issues like model drift and ensure optimal performance.

By implementing these techniques, developers can enhance the efficiency of their LLMs to handle specific tasks, efficiently manage prompts, and continuously monitor the real-time performance of the models. These techniques also address challenges related to memory management, infrastructure, and model monitoring for drift. By integrating LLMOps into their workflow, developers can optimize the performance, cost, and latency of LLMs, ensuring reliability and cost efficiency in production environments. As LLMs become increasingly advanced, there will be an increased emphasis on interpretability and explainability in the outputs generated by these models, enabling better understanding of the decision-making process and identification of potential biases or errors.

Another serious bottleneck to Generative AI is Data itself.

To harness the potential of LLMs, one effective approach is to generate embeddings. Embeddings are data representations that capture semantic information, allowing AI systems to understand and retain long-term memory. This capability is crucial for performing complex tasks successfully. However, the efficient storage and retrieval of embeddings require the use of Vector databases, specialized databases designed to handle high-dimensional vectors, such as the embeddings generated by LLMs. These databases are optimized for similarity search and can efficiently store and retrieve vectors based on their similarity to a query vector. They enable fast and scalable retrieval of embeddings, allowing AI systems to retrieve relevant information quickly.

LLMs require a lot of data, but a lot of structured data that is labeled and annotated. Real-world data is tied down to many challenges including limited availability, privacy concerns and a lack of data diversity. Companies that can solve these data problems, at scale and for high-value industries, we think, will see a lot of demand in the coming years.

In exploring data as a fuel (and bottleneck) to GenAI, we studied the Synthetic Data market.

Synthetic data is a type of information that is artificially created or generated by computer programs or algorithms, instead of being collected from real-life sources. It is designed to mimic or resemble real data, but it is not derived from actual observations or measurements.

Synthetic data offers advantages such as safeguarding privacy by tackling personally identifiable information (PII) and complying with data regulations like GDPR. It enables scalable AI applications, enhances diversity in AI models, and addresses the “cold start” problem for startups with limited data. Companies like Rockfish.ai, Gretel.ai, Tonic.ai, and Mostly.ai provide reliable synthetic data solutions. You can find our learning in a separate blog post here.

Why we are excited about the Enablement Layer
~ In a gold rush, build picks and shovels ~

Our excitement in the enablement area arises from a combination of factors. We anticipate a growing need for the development of applications that are either AI-first or leverage GenAI and believe there needs to be a reliable underlying tooling to help developers and LLM-focused data scientists do more with this technology.

We are also of the opinion that the enablement layer would require a lower capital investment than the foundational or infrastructure/compute layer, and yet will have sufficient barriers to entry to build a tech moat in the business. Unlike application layer companies, we also believe that distribution and operational execution might be less of a problem in the enablement layer.

Given India’s thriving developer ecosystem and the growing contributions to AI projects on Github from Indians, we cannot wait to see the innovations that get built from India in this space.

We also see a few shifts in technology stacks that GenAI will herald, some of them listed as follows :

DevTools Reimagined

As mentioned in our previous post, developer tools have a tremendous opportunity to be transformed by AI, and Github Copilot represents a significant paradigm shift for software developers.

AI-powered developer tools are enabling engineers to build, test, and deploy applications at unprecedented speeds. These tools leverage code LLMs to assist developers in solving a breadth of programming-related tasks. Code LLMs are a unique class of LLMs that have been trained on a massive amount of source code across a variety of programming languages. They are adept at common programming-related tasks such as code generation and code reasoning. Many startups have come out with promising alternatives to GitHub Copilot, and new terminals and IDEs are reimagining the surface area where code is written and development work gets done. Code maintenance tools are leveraging code LLMs to automate tasks such as codebase migrations and code refactoring. The landscape of programming languages is constantly evolving, and new languages like Mojo are being created to address the changing needs and challenges in software development.

Overall, the ongoing evolution of programming languages, coupled with the introduction of innovative tools and technologies, is reshaping the landscape of software development. Through the integration of code LLMs, semantic code search, LLM-based agents, and AI-powered automated testing, developers can experience heightened productivity, improved code quality, and increased efficiency in the software development lifecycle.

Our conviction lies in the notion that AI-driven developer tools will predominantly follow open-source principles and will lead to a substantial decrease in developer burden. We advocate for a fresh perspective on the conventional developer journey, suggesting a complete overhaul of how developers engage with code language models. This innovative approach has the power to greatly alleviate developer toil, with these dynamic toolchains operating in real-time and poised to entirely revolutionize the software development process.

Unlocking LLMs and Leveraging LLM-led Data Stack in Enterprise

In the realm of enterprises, there exists a multitude of scenarios where LLMs can be implemented. Among these scenarios, one such use case is search, where robust search capabilities are required to quickly and accurately retrieve information from large volumes of data across sources such as emails, documents, and CRMs.

Another common use case is natural language querying, which enables users to interact with data using conversational language queries and also used in customer support and service, where it can automate responses to frequently asked questions, analyze customer feedback, and provide personalized recommendations.

Additionally, LLMs can assist in knowledge management by organizing and indexing large amounts of information, extracting key details from documents, and providing intelligent search capabilities within knowledge management systems.

These are just a few examples of how LLMs are applied in enterprise use cases. The specific use cases may vary depending on the industry and individual enterprise requirements. Deploying LLMs in an enterprise requires ensuring compatibility, which involves fine-tuning the system, offering explainability and ensuring privacy and security measures are in place

In the rapidly evolving landscape of Modern Data Stack, the integration of LLMs holds the promise of reshaping traditional data stack architectures. The fundamental power of LLMs lies in their ability to process, understand and generate human-like text at an unprecedented scale and complexity to extract deeper insights for the businesses.

LLMs have the capability to understand human language at a large scale and complexity, which can drive informed business decisions.

Enterprises are cautious about the integration of LLMs into their data stacks. Concerns around proprietary company data and the Personally Identifiable Information (PII) of employees and customers have raised questions about data security and privacy. Organizations need to ensure that sensitive information is not compromised when utilizing LLMs for data processing.

To address these concerns, enterprises are exploring the development of private LLMs trained using both public and proprietary data. This approach aims to generate accurate and secure answers to business questions while maintaining data privacy. By harnessing the power of LLMs within their data stacks, organizations can strike a balance between improved data processing capabilities and the protection of sensitive information.

Unlocking LLMs and their integration into the enterprise landscape presents a new horizon of possibilities, bridging the gap between data and actionable knowledge.

GenAI is a fundamental technology shift, one that is still nascent but will only continue to evolve and get better with time.

And this shift runs through and through the GenAI stack and at every layer — From CPUs to GPUs in the hardware infrastructure layer, from generic foundational models to domain-specific foundational models, from MLOps to LLMOps in the enablement layer, from vertical SaaS to Vertical-AI SaaS in the application layer, from the Modern Data Stack to the AI-led Data Stack, and many more.

We are excited by the multitude of possibilities these shifts will bring and have been avidly tracking developments in the space.

We have also been speaking to founders, engineering and product leaders as well as data professionals brainstorming ideas and sharing each others’ experiences. If you have been experimenting with LLMs (or using them in production) or have thoughts/ feedback on our work, we would love to hear from you.

We want to learn from you as to what is working in the market, and help you in any way we can. Please feel free to reach out to me at dhanush.ram@specialeinvest.com

Speciale Thanks to Ayush for his invaluable contribution in assembling this Generative AI Co-Pilot for us.