- Queries and documents are tightly coupled and should be preprocessed as such
- Sparsity gets you precision but lacks context
- Preprocessing is very hard
- Documents are not just plaintext
- Enrichment is extremely important
- The pipeline is indexing, document enrichment, retrieval based on query, relevance ranking based on query context
- Dense retrieval is a lot more approximate (nearest neighbor)
- Can’t create vectors without losing lots of information
- LLMs aren’t a silver bullet because of hallucination, instability, model updates, latency, making them impossible to sub in for traditional pipelines
- They allow finetuning but data protection, latency, and cost are issues, fine tuned vendor models are way more expensive
- Paradigms: BERT = encoder, T5 = encoder decoder, GPT = decoder
- Bloomberg will build all of these with various data and sizes and allow for applications to be built on them
- Use the LLM at the end after traditional inference steps (retrieval, ranking) for summarization and QA (this is RAG)
- Do training while doing inference
- LLMs are useful annotators and can be used as teacher models, distillation to small transformers
- Queries will be more complex questions, we can have LLMs for input and output at the UI level
- All companies who are selling LLM products have a vested interest in making their papers and blog posts thinly veiled advertisements