By Aleksandar Milenkoski & Phil Stokes
Recent weeks have seen ChatGPT catapult to the front of social media attention in Infosec circles after a flurry of tweets and postings appeared demonstrating how OpenAI’s ChatGPT can be utilized to automate and assist in various cybersecurity tasks.
In this post, we show how ChatGPT can bring value to reverse engineers and malware analysts, both those just entering the field as well as more experienced professionals. We will give specific examples and highlight some of the pitfalls for the following reverse engineering and malware analysis challenges:
- Learning how to use reverse engineering tools more effectively
- Teaching yourself assembly language
- Understanding how source code translates to disassembly
- Writing POC source code quickly
- Translating between instruction sets
- Comparing language or platform specific conventions
- Analyzing code segments in malware samples
- Identifying malicious activities in code
- Speculating on function purposes and objectives
- Understanding vulnerabilities and exploit code
- Automating reverse engineering tasks
Getting the Most Out of ChatGPT
Before we start, there are a few things to bear in mind when using ChatGPT.
First, ChatGPT has been trained using publicly available data. Its abilities to provide accurate and up-to-date answers are no better than the accuracy and relevance of its training data. For example, ‘help’ questions on rapidly developing applications may refer to older versions than those the user has to hand. Similarly, code excerpts on evolving languages like Go, Rust and Swift, may use APIs or conventions that have already been superseded.
Even on topics that appear to be relatively well-established, ChatGPT’s output is very much ‘the thoughts of the crowd’, not ‘irrefutable facts’. Naturally, those two may often overlap, but it’s not necessarily always the case. Thus, think of ChatGPT’s output as the opinion of a well-informed guide that you need to confirm for yourself rather than the arbiter of truth on any given topic.
Second, ChatGPT does not provide what we might call ‘deterministic’ answers: it may give different answers to the same or similar questions at different times, and when asked in different ways. This can be used to advantage: rewording and refining a question can lead to better answers. The ‘Regenerate response’ option at the end of each answer can kick ChatGPT into providing more detail, and adding context to the question can help make answers more relevant.
With those caveats in mind, let’s start exploring how ChatGPT can help us solve some common reverse engineering and malware analysis challenges.
1. Learn How To Use Reverse Engineering Tools More Effectively
Software tools usually come with varying degrees of built-in help, and what they lack is usually made up for by dedicated user forums and Q&A sites like Stack Overflow, Stack Exchange and others. ChatGPT adds another avenue for getting rapid help on reverse engineering tools.
Whether you are using IDA Pro, Ghidra, Radare2, Hopper, Cutter or some other reverse enginering platform, ChatGPT can help. While all these platforms contain their own built-in help functions, if the issues have been covered in ChatGPT’ss training model, you may find its ability to answer specific questions relative to your own use case a faster way to get things done.
2. Teach Yourself Assembly Language
When it comes to the basics of assembly and memory management techniques of programming languages and platforms – topics that any prospective reverse engineer should be familiar with – ChatGPT excels at conveying relevant information.
For example, ChatGPT provides informative answers to inquiries about the basics of function invocation and the related stack memory management activities.
We can ask ChatGPT to be more or less verbose in its output. Here for example, we would like to get a visual representation of a stack frame but without the chatter.
Assembly code is platform- and compiler-specific. If an assembly-related question issued to ChatGPT does not include specificalities related to a platform (i.e., instruction set) or a higher-level language from which assembly is compiled, ChatGPT provides relevant disclaimer information to properly position the answer.
This provides a good learning experience, especially since a user can continue the conversation and issue additional questions that contain platform or compiler specificalities for further details. An assembly student could benefit significantly by continuously conversing with ChatGPT on a given topic so that ChatGPT could cover more angles as the conversation progresses.
Another way in which ChatGPT can assist conquering assembly is by translating high-level code that the user is familiar with to assembly code. This bolsters learning by mapping familiar concepts to assembly. We observed that ChatGPT handles well a variety of topics, including non-trivial concepts that are crucial when it comes to learning assembly, such as pointers and function pointer invocations. ChatGPT’s responses typically include commented assembly code, which further increases the learning effect.
void print_int(int param)
{
printf(“The value of the integer is %d\n”, param);
}
int main()
{
void (*pi_ptr)(int) = &print_int;
(*pi_ptr)(7);
return 0;
}
3. Understand How Source Code Translates to Disassembly
As malware analysts, much of the time we are looking at malware through the eyes of a disassembler. Experience and knowledge of a programming language are vital here, but ChatGPT can help us to learn how known source code looks in a disassembler, and how code changes are reflected in disassembly. Novices can use this to theorize about what some disassembled code might be doing by writing their own source code and seeing if it is suitably similar to the disassembly they are looking at. This can help less experienced analysts develop their understanding about malicious code.
4. Write PoC Source Code Quickly
ChatGPT can even help us to write the source code we need to test our theories. For example, we can ask the AI the following questions:
However, sometimes ChatGPT needs a little coaxing. After writing our requested function, it decides to delegate the disassembly back to us:
Let’s ask it again. First we copy the code from the previous answer, and then paste it after giving our explicit instructions.
Now, we get the disassembly we were looking for.
5. Translate Between Instruction Sets
Given that assembly code is platform-specific, more experienced reverse engineers could leverage ChatGPT to inquire about different instruction sets than those they are already familiar with. One way to do this is by instructing ChatGPT to translate assembly code written in one instruction set to another.
This provides the basis for further exploring the instruction set of interest, for example, by querying ChatGPT about further information on instructions in the translated code.
6. Compare Language or Platform Specific Conventions
Experienced reverse engineers can also benefit from using ChatGPT to inquire about differences in the memory management techniques of programming languages and platforms, for example, calling conventions.
At the time of writing this article, ChatGPT is trained using training data up until 2021. Therefore, ChatGPT does not provide current information if some platform- or high-level language specificalities have changed after that point in time. An example of a change in a calling convention is the transition from stack-based to register-based calling convention in the Golang language.
Experienced reverse engineers, and malware analysts in particular, could leverage ChatGPT to get familiar with high-level constructs of programming languages that are increasing in popularity and how these constructs are represented in assembly. For instance, the memory-safe Golang and Rust are increasingly being adopted by malware developers.
7. Analyze Code Segments in Malware Samples
ChatGPT has the capability to interpret and analyze code – relevant in the context of reverse engineering, both pseudocode as well as assembly. This makes ChatGPT useful in analyzing code segments of a malware executable, such as functions, primarily in the sense that ChatGPT can provide a summary of the activities that the code executes.
This can significantly increase the efficiency of a malware reverse engineer. The Gepetto IDA Pro plugin integrates ChatGPT in IDA Pro and queries the language model to provide meaning to functions decompiled by the Hex-Rays disassembler.
The capability to interpret code also enables the comparison of code, enabling malware analysts to understand differences between the implementation of different malware samples.
In order to summarize the functionalities of the code at the descriptive level that an analyst typically needs, ChatGPT may lack the needed broader context about the executable under analysis, which a human analyst may have.
Given that an analyst provides little or no context to ChatGPT, the model provides maximum immediate value if the code under analysis is self-contained with relation to its purpose. In practice, this typically means that the code does not invoke user-defined functions that extend the code’s functionalities in a manner unknown to ChatGPT, but, if it invokes functions, they are known, publicly documented library functions. Since ChatGPT is trained based on publicly available data, the language model can at this point in time accurately explain the use of such functions in user-provided code.
For example, if pseudocode provided to ChatGPT references publicly documented library functions, ChatGPT’s explanation about the purpose of the code is centered around the functionalities of those functions.
For better informed code analysis output from ChatGPT, it remains to the user to:
- formulate substantial ChatGPT queries such that the needed context is provided;
- engage in a conversation with ChatGPT to provide context during the dialogue and refine ChatGPT’s answers;
- try the ‘regenerate response’ option at the end of an answer, which appears to function as a kind of ‘try harder’ instruction to ChatGPT.
Adding more context to ChatGPT can include functionalities of user-defined functions that a human analyst has knowledge about. Contextual information can be provided in a programmatic manner to reduce the manual effort of a human analyst, for example, through a disassembler plugin developed for that purpose.
The same applies for improving the output from ChatGPT from a non-technical perspective. For example, ida_gpt, an IDA Pro plugin that assists assembly code analysis by querying ChatGPT, formulates the queries below for analyzing and refactoring assembly code, respectively.
Here are a couple of examples ida_gpt
ChatGPT queries:
Reference notable constants used.
Provide a description of what it could be used for, or a name for the function if it is familiar to any known purpose.
Please write this succinctly and in present tense.
Do not say anything you are unsure of.
Start the response with “The function” and respond in less than 200 words.
Disassembly:
[user-provided disassembly code]
For any variable name you identify starting with “var”, “loc”, “arg” or “sub”, please suggest a more helpful name in a list format.
The format of the list should be in the format “old_variable_name:new_variable_name”, with each list entry on a new line.
Do not print anything besides the list and a suggested name for the function, starting with “func_”, which should be at the end.
Disassembly:
[user-provided disassembly code]
8. Identify Malicious Activities in Code
Malware analysts can use ChatGPT to identify indicators of potential malicious activities that a function may implement. This is important for mapping functions in a malware executable to specific malicious capabilities, similar to what the capa IDA Pro plugin does.
In this context, we observed that ChatGPT is capable of prioritizing the strength of all indicators of malicious activities within a function. Therefore, a malware analyst could scope the interaction with ChatGPT to discuss the strongest indicator in greater detail.
For example, OpenGPT identified the execution of vssadmin.exe
as the strongest indicator of malicious activity in the pseudocode below and elaborated on the topic.
9. Speculate on Function Purposes and Objectives
In addition to identifying malicious activity indicators, malware analysts can further converse with ChatGPT to speculate on and better understand how malware uses particular platform or software constructs and to what end. This can be productive even in scenarios where the analyst does not provide comprehensive context.
For example, the ransomware pseudocode code below uses the Microsoft Cryptographic API (CAPI), also referred to as Cryptography API: Next Generation (CNG) cryptographic architecture, for encrypting data.
10. Understand Vulnerabilities and Exploit Code
Understanding how vulnerabilities work, how they may be exploited by malware authors, and how we can identify and detect their use in code is an extremely challenging task. ChatGPT can help us here, too.
Let’s take a recent exploit of CVE-2022-468889 and see if ChatGPT can help us understand how the code works.
ChatGPT’s first attempt gives us the following explanation.
The AI’s initial answer is OK, but it clearly doesn’t understand the wider context of the vulnerability. We can help it out by providing more information. Because ChatGPT is context aware, we don’t need to repeat the previous question or paste the previous code again.
Let’s see what answer it provides now.
Because of ChatGPT’s contextual awareness, it’s possible for the researcher to dig down into any specific parts of this explanation that they would like more information about.
As we saw in previous challenges, we could also ask for a representation in disassembly to see how some or all of the exploit code might look in a malware sample.
11. Assist in Automating Reverse Engineering Tasks
Reverse engineers turn to scripting languages to automate repetitive or error-prone tasks if done manually, such as renaming variables or deobfuscating obfuscated code at scale. This can significantly speed up and improve the efficiency of reverse engineering tasks. ChatGPT is capable of writing code, including IDAPython, the scripting language of the IDA Pro disassembler.
Since ChatGPT is currently trained using data dated up until 2021 and since IDAPython is undergoing regular changes, we observed that ChatGPT often writes outdated IDAPython scripts. We therefore evaluate that possibly the most practical use case of IDAPython code generated by ChatGPT is to serve as template code that the user may have to adjust to a minor or moderate extent so that the code is functional in current deployments. This would typically involve changing referenced modules and function names to accommodate for changes in the IDAPython API. Template IDAPython code that requires minor or moderate modifications is very practical in scenarios where the IDAPython code that needs to be written is of considerable size.
Conclusions
These are just some of the ways reverse engineers and malware analysts can leverage ChatGPT to help them in their work. And as we consider other ways we can utilize this remarkable tool, let’s not forget we can ask ChatGPT to help us with this, too. This is what ChatGPT had to say about “11 Problems ChatGPT Can Solve For Reverse Engineers and Malware Analysts”.
ChatGPT can:
- generate explanations and summaries of the functions and actions performed by malicious code, which can help reverse engineers and malware analysts understand its purpose and behavior.
- assist with the task of disassembling and decompiling code, breaking it down into smaller, more manageable chunks for analysis.
- help reverse engineers and malware analysts understand the relationships between different parts of a codebase and how they work together, which can be useful for identifying and understanding code dependencies.
- assist with the process of identifying and understanding code vulnerabilities by generating explanations and summaries of vulnerabilities and their potential impacts.
- help reverse engineers and malware analysts understand the techniques used to obfuscate code, which can be useful for analyzing and deobfuscating malicious code.
- assist with the task of generating documentation and reports on code analysis and malware analysis findings, helping to communicate findings and results to stakeholders.
- provide guidance and suggestions for further analysis, helping reverse engineers and malware analysts to prioritize their efforts and focus on the most important aspects of their work.
- be used to create educational materials and exercises for reverse engineering and malware analysis training, helping to build skills and knowledge in these areas.
- help facilitate collaboration between team members by providing a shared repository of information and analysis findings, which can help improve efficiency and effectiveness.
- assist with the generation of test cases and scenarios for code and malware analysis, helping to ensure that analysis is thorough and comprehensive.
- provide assistance with legal and forensic investigations by generating explanations and summaries of code and malware behavior, which can be useful for building a case and demonstrating the impacts of malicious activity.
Over the last few weeks of our experiments, we found that ChatGPT is a valuable tool that can help reverse engineers of all skill levels.
For beginners, ChatGPT can provide a comprehensive introduction to the concepts and skills needed to master reverse engineering, such as the basics of assembly language and the background knowledge required to understand how programs are constructed and function.
For experienced reverse engineers and malware analysts, ChatGPT can be used to automate and speed up reverse engineering tasks, such as analyzing code and understanding its functionality. The value of ChatGPT’s answers to reverse engineers and malware analysts depends on the amount of context information provided to the language model. This can be provided by the human issuing context-full inquiries to ChatGPT and/or engaging in a conversation with ChatGPT for answer refinement.
In the future, ChatGPT has the potential to become even more powerful and useful for reverse engineers and malware analysts. With continued development, it may be possible to overcome some of its current limitations, such as the operational dependency on data that is finite and with a past timestamp. By addressing these limitations, ChatGPT could become an indispensable tool for reverse engineers and analysts, providing the information needed to accurately and efficiently analyze code.