Example use cases:
How it works:
Via the API, your software sends a prompt to the LLM, receives a response, and then integrates that response back into the workflow. This is ideal for businesses that want a flexible and dynamic AI tool without having to modify the model itself.
2. Fine-tuning
Fine-tuning is a more advanced customization method where the LLM is retrained with your specific business data. This process allows the model to learn the nuances of your industry, whether it’s the unique language used in contracts, regulations, or technical reports in the construction field.
Fine-tuning adjusts the model's weights based on the new data, making the model’s understanding more specific to your domain. This is particularly useful when you need the model to generate consistent, specialized outputs, such as legal documents, highly technical reports, or detailed project proposals.
Example:
A construction company can fine-tune an LLM using thousands of documents related to building codes, safety standards, or contract language. After fine-tuning, the model would be able to generate custom legal documents or respond accurately to technical queries about site regulations.
How it works:
You gather your business-specific dataset (e.g., contracts, internal documents, reports) and use that to retrain the LLM. This process usually involves several steps:
-
Data preprocessing: Cleaning and formatting your data to ensure it's suitable for training.
-
Training: Fine-tuning the pre-trained LLM with your data over multiple iterations (epochs), allowing the model to learn domain-specific language and patterns.
-
Deployment: Once fine-tuned, the model is deployed and ready to respond to inputs with the specialized knowledge it has learned.
While fine-tuning requires more technical resources and sufficient data — typically 10,000 or more relevant samples — the result is a model that can generate highly accurate, industry-specific responses.
3. Embedding Retrieval and Retrieval-Augmented Generation (RAG)
Another powerful method for customizing LLMs is embedding retrieval. This method involves storing your business data in a vector database, then allowing the model to retrieve relevant information from this database when generating responses.
Embeddings are numerical representations of text, created by transforming words, sentences, or documents into high-dimensional vectors that represent their meaning. These vectors are stored in a database and can be queried based on their similarity to an input prompt.
For example, let’s say your construction company has a database of documents containing local building codes. When the LLM receives a query like "What are the building permit requirements in Seattle?", it can retrieve the most relevant documents from the vector database, providing specific, up-to-date information in its response.
Retrieval-Augmented Generation (RAG) takes this concept further by combining the power of LLMs with real-time data retrieval. In a RAG system, when a query is made, the LLM first retrieves the most relevant documents from a vector database based on the query's embedding, and then generates an answer by combining the retrieved information with its own pre-trained knowledge.
Example:
A project manager asks, "What are the latest changes to building codes in Seattle?". The LLM retrieves up-to-date documents from the vector database and generates a response that combines this external knowledge with its own understanding of construction-related regulations.
How it works:
-
Embedding model: Convert your documents into vector embeddings and store them in a vector database (e.g., Azure Cosmos DB with vector search, or FAISS).
-
Querying: When a query is made, the model converts the query into an embedding and retrieves the most semantically similar documents from the database.
-
Response generation: The retrieved documents are included in the prompt as context, allowing the LLM to generate an accurate response based on your business data and any pre-trained knowledge.
This method is ideal for dynamic information retrieval, ensuring that the LLM uses the most current and relevant data to assist with specific tasks, without requiring a full fine-tuning process.