LLM security - Part 2

In the previous blog post, we gave an introduction to how LLMs work and started looking at various offensive measures that we use to break them. In this post, we will explore more offensive strategies, look at examples from the wild and also briefly touch upon the common defenses used by AI companies.

Also, most LLMs today have browsing capabilities. Here, the adversarial instructions are introduced by a third party data source like a web search or API call. You can make the LLM go to a particular website and load your malicious instruction from here and this is especially more prevalent with ChatGPT plugins and the upcoming GPT Store.

In the next blog post, we will look at various defensive measures.

CTF games to practice prompt injection

More resources and reading

