Ask HN: How do you add guard rails in LLM response without breaking streaming?

2 points by curious-tech-12 8 hours ago

Hi all, I am trying to build a simple LLM bot and want to add guard rails so that the LLM responses are constrained. I tried adjusting system prompt but the response does not always honour the instructions from prompt. I can manually add validation on the response but then it breaks streaming and hence is visibly slower in response. How are people handling this situation?

throwaway888abc 6 hours ago

Not sure about the exact nature of your project, but for something similar I’ve worked on, I had success using a combination of custom stop words and streaming data with a bit of custom logic layered on top. By fine-tuning the stop words specific to the domain and applying filters in real-time as the data streams in, I was able to improve the response to users taste. Depending on your use case, adding logic to dynamically adjust stop words or contextually weight them might also help you.