Ask HN: How do you add guard rails in LLM response without breaking streaming?
Hi all, I am trying to build a simple LLM bot and want to add guard rails so that the LLM responses are constrained. I tried adjusting system prompt but the response does not always honour the instructions from prompt. I can manually add validation on the response but then it breaks streaming and hence is visibly slower in response. How are people handling this situation?
Not sure about the exact nature of your project, but for something similar I’ve worked on, I had success using a combination of custom stop words and streaming data with a bit of custom logic layered on top. By fine-tuning the stop words specific to the domain and applying filters in real-time as the data streams in, I was able to improve the response to users taste. Depending on your use case, adding logic to dynamically adjust stop words or contextually weight them might also help you.