Safety Best Practices
Leverage our complimentary Moderation API
QX LABS PTE. LTD.’s Moderation API is available for free and can assist in minimising the occurrence of inappropriate content in your generated outputs. Alternatively, you may opt to create a custom content filtration system tailored to your specific use case.
Adversarial testing
We recommend conducting “red-team testing” on your application to ensure its resilience to adversarial input. Test your product across a diverse range of inputs and user behaviours, covering both a representative set and scenarios simulating attempts to ‘challenge’ your application. Evaluate whether it deviates from the intended topic and if it can be easily manipulated through prompt injections, such as instructing it to “disregard the previous instructions and perform this action instead.”
Human in the Loop (HITL)
Whenever feasible, we recommend incorporating human oversight to review outputs before practical use. This is particularly crucial in high-stakes domains and for tasks involving code generation. Humans should be informed about the system’s limitations and have access to any necessary information for verifying outputs (for example, if an application summarises notes, a human should easily access the original notes for reference).
Prompt Engineering
Utilising “prompt engineering” proves effective in constraining the topic and tone of output text. This minimises the likelihood of generating undesired content, even when a user attempts to elicit it. Offering additional context to the model (such as providing a few high-quality examples of desired behaviour before presenting new input) can facilitate steering model outputs in desired directions.
“Know Your Customer” (KYC)
Typically, users should register and log in to access your service. Linking this service to an existing account, such as Gmail, LinkedIn, or Facebook log-in, may be beneficial, but its appropriateness depends on the specific use-case. Requesting a credit card or ID card can further reduce risk.
Constrain User Input and Limit Output Tokens
To prevent prompt injection, it’s advisable to limit the amount of text a user can input into the prompt. Reducing the number of output tokens helps minimise the risk of misuse. Narrowing the ranges of inputs or outputs, especially from trusted sources, curtails the potential for misuse within an application. Validated dropdown fields (e.g., a list of movies on Wikipedia) for user inputs can be more secure than open-ended text inputs.
Return Outputs from Validated Sets
Returning outputs from a validated set of materials on the backend, whenever possible, is safer than generating novel content. For instance, routing a customer query to the most relevant existing customer support article rather than attempting to answer the query from scratch can enhance security.
Enable Issue Reporting
Users should have an easily accessible method for reporting improper functionality or concerns about application behaviour (such as a listed email address or ticket submission method). This reporting method should be monitored by a human and addressed appropriately.
Understand and Communicate Limitations
From generating inaccurate information to producing offensive outputs and exhibiting bias, language models may not be suitable for every use case without significant modifications. Assess whether the model is suitable for your purpose and evaluate the API’s performance across a diverse range of potential inputs to identify scenarios where performance might decline. Consider your customer base and the variety of inputs they might use, ensuring their expectations are appropriately calibrated.