Individual machine instructions make up the biggest source of information when the decompiler analyzes a function. Instructions are translated from their processor specific form into Ghidra's IR language (see “P-code”), which provides both the control-flow behavior of the instruction and the detailed semantics describing how the processor and memory state is affected. The translation is controlled by the underlying processor model and, except in limited circumstances, cannot be directly altered from the tool. Flow Overrides (see below) can change how certain control-flow is translated, and, depending on the processor, context registers may affect p-code (see “Context Registers”).
Outside of the tool, users can modify the model specification itself. See the document "SLEIGH: A Language for Rapid Processor Specification".
Decompiling a function starts by analyzing control-flow starting from the function's first instruction. Control-flow is traced to additional instructions using flow information from the underlying processor model. All paths are traced through instructions with fall through, conditional jump, and other semantics until an instruction with terminator semantics is reached, which is usually a "return from subroutine" instruction. Flow is not traced into called functions, in this situation. Instructions with call semantics are treated only as if they fall through.
An entry point is the address of the function's first instruction.
A function body is the set of addresses reached by control-flow analysis (and the machine instructions at those addresses).
The entry point address for a function plays a pivotal role for analysis using the Ghidra decompiler. Ghidra generally associates a formal Function Symbol and an underlying Function object at this address, which are the key elements that need to be present to trigger decompilation. (See Functions) The Function object stores the function body, parameters, local variables, and other information critical to the decompilation process.
Function Symbols and Function objects are generally created automatically by a Ghidra analyzer when initially importing a binary executable and running auto-analysis. If necessary however, a user can manually create a Function object from the Listing window by using Create Function command (pressing the 'f' key), when the cursor is placed on the function's entry point. (See Create Function)
When a function is created, Ghidra stores its function body as a set of addresses in the Program database. This formal function body delineates the function from all the other kinds of data within the Program and lets Ghidra immediately link addresses in the middle of the function to the entry point and the Function object. Decompiler windows in particular use the formal function body to know which function to decompile in response to a navigation event to an arbitrary address.
![]() |
|
The decompiler does not use the formal function body when it computes control-flow; it recomputes its own idea of the function body starting from the entry point it is handed. If the formal function body was created manually, using a selection for instance, or in other extreme circumstances, the decompiler's view of the function body may not match the formal view. This can lead to confusing behavior, where clicking in a decompiler window may unexpectedly navigate the window away from the function. |
Control-flow behavior for a machine instruction is generally determined by its underlying
p-code (see “P-code Control Flow”), but this can be changed by applying a Flow Override.
A Flow Override maintains the overall semantics of a branching instruction
but changes how the branch is interpreted. For instance, a JMP
instruction, which traditionally
represents a branch within a single function, can be overridden to represent a call to a new function.
Flow Overrides are applied by Analyzers or manually by the user.
The decompiler automatically incorporates any relevant Flow Overrides into its analysis of a function. This can have a significant impact on results. The types of possible Flow Overrides include:
Treats the primary CALL or RETURN behavior of the instruction as if it were a BRANCH within the function. For CALL instructions, the call target becomes the branch destination, and the instruction is no longer assumed to fall through. RETURN instructions become an indirect branch, and the decompiler will attempt to recover branch destinations using switch analysis.
Treats the primary BRANCH or RETURN behavior of the instruction as if it were a CALL. A BRANCH becomes a fall through instruction, and the destination address becomes the call target, which may no longer be considered part of the function. The computed address for an indirect BRANCH or RETURN instruction becomes the target address of an indirect CALL.
Treats the primary BRANCH or RETURN behavior of the instruction as if it executed a CALL followed by a RETURN operation. The destination address of a BRANCH becomes the call target, which may no longer be considered part of the function. The computed address for an indirect BRANCH or RETURN instruction becomes the target address of an indirect CALL.
Treats an indirect BRANCH or CALL instruction as if it were a RETURN instruction, terminating the control-flow path within the function. The computed destination address is considered part of the return mechanism of the function and may no longer be explicitly displayed in the output. An indirect BRANCH no longer invokes switch analysis during decompilation.
The decompiler automatically incorporates comments from the Program database into its output. Comments in Ghidra are centralized and can be created and displayed by multiple Program views, including the decompiler. Comments created from a decompiler window will show up in the Listing window for instance, and vice versa.
For the purposes of understanding comments within the decompiler, keep in mind that:
For general documentation on creating and editing comments within Ghidra, see Comments.
The decompiler collects and displays comments associated with any address in the formal function body currently decompiling. The comments are integrated line by line into the decompiled code, and an individual comment is displayed on the line before the line of code incorporating the instruction associated with the comment's address.
Because a single line of code typically encompasses multiple machine instructions, there is a possibility that multiple comments at different addresses apply to the same line. In this case, the decompiler displays each comment on its own line, in address order, directly before the line of code.
Because the output of the decompiler can be a heavily transformed version compared to the original machine instructions, its possible that individual instructions no longer have explicit tokens representing them in the output. Comments attached to these instruction will still be displayed in the decompiler output with the closest associated line of code, usually within the same basic block.
By default, the decompiler displays only the Pre comments within the body of the function. It also displays Plate comments, but only if they are attached to the entry point of the function. In this case, they are displayed first in the decompiler output, along with WARNING comments, before the function declaration. Other comment types can be configured to display in decompiler output, by changing the decompiler Display options (See Display <kind-of> Comments).
![]() |
|
Unlike the Listing window, the decompiler does not alter how a comment is displayed based on its type. All enabled types of comment are displayed in the same way, on a separate line before the line of code associated with the address. |
The decompiler may decide as part of its analysis that individual basic blocks are unreachable and not display them in the output. In this case, any comments associated with addresses in the unreachable block will also not be displayed.
The decompiler can generate internal warnings during its analysis and will incorporate them into the output as comments in the same way as the user defined comments described above. They are not part of Ghidra's comment system however and cannot be edited. They can be distinguished from normal comments by the word 'WARNING' at the beginning of the comment.
/* WARNING: ... */
Variable annotations are the most important way to get names and data-types that are meaningful to the user incorporated into the decompiler's output. A variable in this context is loosely defined as any piece of memory that code in the Program treats as a logical entity. The decompiler works to incorporate all forms of annotation into its output for any variable pertinent to the function being analyzed.
At a minimum, a variable annotation in Ghidra provides a:
Ghidra provides various ways that a name and other attributes can be ascribed to a variable. These break up roughly into global variables, defined directly on memory in the Program image, and variables that are local to a function.
Global variables annotations are created from the tool by applying a data-type to a memory location in the Listing window, either by invoking a command from the Data pop-up menu, or dragging a data-type from the Data Type Manager window directly onto the memory location. Refer to the documentation:
Local variables annotations are created from the Listing from various editor dialogs. See in particular:
The Decompiler window also provides numerous ways of annotating variables, both local and global. In particular, see the commands:
Ghidra maintains its own symbol table that supports namespaces and function scopes, and variable names are automatically incorporated into this. In order to widely accommodate different use cases, Ghidra's symbol table has extremely lax naming rules. Ghidra may allow names that conflict with the stricter rules of the language the decompiler is attempting to produce. The decompiler does not currently have an option that checks for this. Users should be aware of:
Ghidra symbols allow almost every printable character except a space in a symbol name; punctuation and keywords can be incorporated.
Ghidra allows different functions to have the same name, even within the same namespace, in order to model languages that support function overloading. In most languages, such functions would be expected to have distinct prototypes to allow the symbols to be distinguished in context. Ghidra and the decompiler however do not check for this, as prototypes may not be known.
All variables belong either to a global or local scope, which directly affects how the variable is treated in the decompiler's data-flow analysis. Annotations created by applying a data-type directly to a memory location in the listing are automatically added to the formal global namespace. Ghidra can create other custom namespaces that are considered global in this sense, and renaming actions provide options that let individual global annotations be moved into these namespaces. Dialogs that are brought up in the context of a function, like the Function Signature Editor, create variable annotations that are local to that function.
A global variable annotation forces the decompiler to treat the memory location as if its value persists beyond the end of the function. The variable must exist at all points of the function body, generally at the same memory location.
Local variables, in contrast, do not generally exist across the whole function, but come into scope at the instruction that first writes to them, and then exist only up to the last instruction that reads them. The memory location storing a local variable at one point of the function may be reused for different variables at other points. This can cause ambiguity in how the decompiler should treat a given memory location used for storing local variables, which the user may want to steer. See the discussion in “Variable Storage”.
Ghidra provides extensive support for naming and describing data-types that are tailored for the Program being analyzed. Data-types that are explicitly part of a variable annotation are, to the extent possible, automatically incorporated into the decompiler's analysis.
The decompiler understands traditional primitive data-types, in all their various sizes, like integers, floating-point numbers, booleans, and characters. It also understands pointers, structures, and arrays, letting it support arbitrarily complicated composite data-types. Ghidra provides some data-types with specialized display capabilities that don't have a natural representation in the high-level language output by the decompiler. The decompiler treats these as black-box data-types, preserving the name, but treating the underlying data either as an integer or simply as an array of bytes.
The undefined data-types are supported, in their various sizes: undefined1, undefined2, undefined4, etc. In Ghidra, the undefined data-types, let the user specify the size of a variable, while formally declaring that other details about the data-type are unknown.
For the decompiler, undefined data-types as an annotation have the important special meaning that the decompiler should let its analysis determine the final data-type presented in the output for the variable (See “Forcing Data-types” below).
The void data-type is supported but treated specially by the decompiler, as does Ghidra in general. A void can be used to indicate the absence of a return value in function prototypes, but cannot be used as a general annotation on variables. A void pointer, void *, is possible; the decompiler treats it as a pointer to an unknown data-type.
Integer data-types, both signed and unsigned, are supported up to a size of 8 bytes. Larger sizes are supported internally but are generally represented as an array of bytes in decompiler output. Odd integer sizes are also supported.
The standard C data-type names: int, short, long, and long long are mapped to specific sizes based on the processor and compiler selected when importing the Program.
Floating-point sizes of 4, 8, 10, and 16 are supported, mapping in all cases currently to the float, double, float10, and float16 data-types respectively. The decompiler currently cannot display floating-point constants that are bigger than 8 bytes.
ASCII or Unicode encoded character data-types are supported for sizes of 1, 2, and 4. The size effectively chooses between the UTF8, UTF16, and UTF32 character encodings respectively. The standard C data-type names char and wchar_t are mapped to one of these sizes based on the processor and compiler selected when importing the Program.
Terminated strings, encoded either in ASCII or Unicode, are supported. The decompiler converts Ghidra's dedicated string data-types like string to an "array of characters" data-type, such as char[], where the character size matches the encoding. A "pointer to character" data-type like
char *
or
wchar_t *
is also treated as a potential string reference. The decompiler can infer terminated strings if this kind of data-type propagates to constant values during its analysis.
Strings should be fully rendered in decompiler output, with non-printable characters escaped using either traditional sequences like '\r', '\n' or using Unicode escape sequences like '\xFF'.
Pointer data-types are fully supported. A pointer to any other supported data-type is possible. The data-type being pointed to, whether its a primitive, structure, or another pointer, informs how the decompiler renders a dereferenced pointer. The decompiler assumes that a pointer variable may refer to an array of the underlying data-type and will use array notation if there is evidence of more than one element.
The default pointer size is set based on the processor and compiler selected when the Program is imported and generally matches the size of the ram (or equivalent) address space. Different pointer sizes within the same Program are possible. The decompiler generally expects the pointer size to match the size of the address space being pointed to, but individual architectures can model different size pointers into the space (such as near pointers).
For processors with more than one memory address space, pointer data-types currently cannot be directly annotated to indicate a preferred address space. Where there is ambiguity, the decompiler attempts to determine the correct address space from the context of its use within the function.
Array data-types are fully supported. The array element can be any other supported data-type with a fixed size.
Structured data-types are fully supported. The decompiler does not automatically infer structures when analyzing a function; it propagates structured data-types into the function from explicitly annotated sources, like input parameters or global variables. Decompiler directed creation of structures can be triggered by the user, see “Auto Create Structure”.
Enumerations are fully supported. The decompiler can propagate enumerations from explicitly annotated sources throughout a function onto constants, which are then displayed with the appropriate label from the definition of the enumeration. If the constant does not match a single value in the enumeration definition, the decompiler attempts to build a matching value by or-ing together multiple labels. The decompiler can be made to break out constants representing packed flags, for instance, by labeling individual bit values within an enumeration.
A Function Definition in Ghidra is a data-type that encodes information about the parameters and return value for a generic/unspecified function. A formal function pointer is supported by the decompiler as a pointer data-type that points to a Function Definition. A Function Definition specifically encodes:
The Function Definition itself does not encode any storage information. Once the Function Definition is associated with a Program, the indicator maps to one of the prototype models for the specific processor and compiler. A Function Definition is currently limited to a prototype model with one of the following names:
The decompiler performs type propagation as part of its analysis on functions. Data-type information is collected from variable annotations (and other sources), which is then propagated via data-flow throughout the function to other variables and constants where the data-type may not be immediately apparent.
With few exceptions, a variable annotation is forcing on the decompiler in the sense that the storage location being annotated is considered an unalterable data-type source. During type propagation, the data-type may propagate to other variables, but the variable representing the storage location being annotated is guaranteed to have the given name and that data-type; it will not be overridden.
![]() |
|
Users should be aware that variable annotations are forcing on the decompiler and may directly override aspects of its analysis. Because of this, variable annotations are the most powerful way for the user to affect decompiler output, but setting an incomplete (or incorrect) data-type as part of an annotation may produce poorer decompiler output. |
The major exception to forcing annotations is if the data-type in the annotation is undefined. Ghidra reserves the following names to represent formally undefined data-types:
These allow annotations to be made even when the user doesn't have information about a variable's data-type. The number in the name only specifies the number of bytes in the variable.
The decompiler views a variable annotation with an undefined data-type only as an indication of what name should be used if a variable at that storage address exists. The data-type for the variable is filled in, using type propagation from other sources.
For annotations that specifically label a function's formal parameters or return value, the Signature Source also affects how they're treated by the decompiler. If the Signature Source is set to anything other than DEFAULT, there is a forced one-to-one correspondence between variable annotations and actual parameters in the decompiler's view of the function. This is stronger than just forcing the data-type; the existence (or not) of the variable itself is forced by the annotation in this case. If the Signature Source is forcing and there are no parameter annotations, a void prototype is forced on the function.
A forcing Signature Source is set typically if debug symbols for the function are read in during Program import (IMPORTED), or if the user manually edits the function prototype directly (USER_DEFINED).
If an annotation and the Signature Source force a parameter to exist, specifying an undefined data-type in the annotation still directs the decompiler to fill in the variable's data-type using type propagation. The same holds true for the return value; an undefined annotation fixes the size of the return value, but the decompiler fills in its own data-type.
![]() |
|
The decompiler may still use an undefined data-type to label a variable, even after type propagation. If a variable is simply copied around within a function and there are no other substantive operations or annotations on the variable, the decompiler may decide the undefined data-type is appropriate. |
Every variable annotation is associated with a single storage location, where the value of the variable is stored during execution: generally a register, stack location, or an address in the load image of the Program. The storage location does not necessarily hold the value for that variable at all points of execution, and its possible for the variable value to be held in different storage locations at different points of execution. The set of execution points where the storage location does hold the variable value is called the annotation scope; this is distinct from (but influenced by) the scope of the variable itself. The different types of storage location are listed below.
A load-image address is a concrete address in the load image of the Program,
typically in the ram address space. This kind of
storage must be backed by a formal memory block for the Program, which typically corresponds to a specific
program section (such as the .text
or .bss
section). Because it is in the
load image directly, an annotation with this storage shows up directly in the Listing
window and can be directly manipulated there. In much of the Ghidra documentation, these annotations
are referred to as Data. See the section
Data in particular.
Although specific architectures may vary, generally a storage location at a load image address represents a formal global variable, and the annotation is in scope across all Program execution. For the decompiler, the storage location is treated as a a single persistent variable in all functions that reference it. Within a function, all distinct references to the storage location (varnodes) are merged. The decompiler expects a value at the storage location to exist from before the start of the function, and any change to the value must be explicitly represented as an assignment to the variable in decompiler output.
A stack address is an address in the stack frame of a particular function in the Program. Formally, a stack address is defined as an offset relative to the incoming value of the stack pointer and exists in the stack address space associated with the function. See the discussion in “Address Space”. A stack annotation then is a variable annotation with a stack address as its storage location. It exists only in the scope of a single function and the variable must be local to that function.
Within the Listing window, a stack annotation is displayed as part of the function header (at the entry point address of the function), with a syntax similar to:
undefined4 Stack[-0x14]:4 local_14
The middle field (the Variable Location field) indicates that the storage location is on the stack, and the value in brackets indicates the offset of the storage location, relative to the incoming stack pointer. The value after the colon indicates the number of bytes in the storage location.
Currently, the entire body of the function is included in the scope of any stack annotation, and the decompiler will allow only a single variable to exist at the stack address. A stack annotation can be a formal parameter to the function, but otherwise the decompiler does not expect to see a value that exists before the start of the function.
The decompiler will continue to perform copy propagation and other transforms on stack locations associated with a variable annotation. In particular, within decompiler output, a specific write operation to a stack address may not show up as an explicit assignment to its variable, if the value is simply copied to another location.
A variable annotation can refer to a specific register for the processor associated with the Program. In general, such an annotation will be for a variable local to a particular function. Within the Listing window, this annotation is displayed as part of the function header, with syntax like:
int EAX:4 iVar1
The Variable Location field displays the name of the particular register attached to the annotation, and the value after the colon indicates the number of bytes in the register.
For a local variable annotations with a register storage location, there is an expectation that the register may be reused for different variables at different points of execution within the function. There may be more than one annotation, for different variables, that share the same register storage location. An annotation is associated with a first use point that describes where the register first holds a value for the particular variable. (See the discussion - “Varnodes in the Decompiler”) The entire scope of the annotation is limited to the address regions between the first use point and any points where the value is read. The decompiler may extend the scope as part of its merging process, but the full extent is not stored in the annotation.
Variable annotations can have a temporary register as a storage location. A temporary register is not specific to a processor but is produced at various stages of the decompilation process. See the discussion of the unique space in “Address Space”. These registers do not have a meaningful name, and the specific storage address may change on successive decompilations. So within the Listing window, this annotation is displayed as part of the function header, with syntax like:
int HASH:5f96367122:4 iVar2
The Variable Location field displays the internal hash used to uniquely identify the temporary register within the data-flow of the function.
A temporary register annotation must be for a local variable, and as with an ordinary register, the annotation is associated with a first use point that describes where the temporary register first holds a value for the variable.
Every formal Function in Ghidra is associated with a set of variable annotations and other properties that make up the function prototype. Due to the nature of reverse engineering, the function prototype may only include partial information and may be built up over time. Individual elements include:
Each formal input to the function can have a Variable Annotation that describes its name, data-type, and storage location, at the moment control-flow enters the function. If annotations exist, they are shown in the Listing Window as part of the Function header, and they usually correspond directly with symbols in the function declaration produced by the decompiler.
The value returned by a function can have a special Variable Annotation that describes its data-type
and storage location, at the moment control-flow exits the function. If it exists, the annotation is shown
in the Listing Window as part of the Function header with the name <RETURN>
, and it usually
corresponds directly with the return value in the function declaration produced by
the decompiler.
The calling convention used by the function can be specified as part of the function prototype. The convention is specified by name, referring to the formal “Prototype Model” that describes how storage locations are selected for individual parameters along with other information about how the compiler treats the function.
In the absence of parameter and return value annotations, the decompiler will use the prototype model as part of its analysis to discover the input parameters and the return value of the function.
The name "unknown" is reserved to indicate that nothing is known about the calling convention. If set to "unknown", depending on context, the decompiler may assign the calling convention based on the Prototype Evaluation option (See Prototype Evaluation), or it may use the default calling convention for the architecture.
Functions have a boolean property called variable arguments, which can be turned on if the function is capable of being passed a variable number of inputs. This property informs the decompiler that the function may take additional parameters beyond any with an explicit variable annotation. This affects decompilation of any function which calls the variable arguments function, allowing the decompiler to discover unlisted parameters at a given call site.
A function can be marked explicitly as not returning, meaning that once a call is made to the function, execution will never return to the caller. The decompiler uses this to compute the correct control-flow in any calling functions.
If the boolean property in-line is turned on for a particular function, it directs the decompiler to inline the effects of the function into the decompilation of any of its calling functions. The function will no longer appear as a direct function call in the decompilation, but all of its data-flow will be incorporated into the calling function.
This is useful for bookkeeping functions, where its important for the decompiler to see its effects on the calling function. Functions that set up the stack frame for a caller or functions that look up or dispatch a switch destination are typical examples that should be marked in-line.
This property is similar in spirit to marking a function as in-line. A call-fixup directs the decompiler to replace any call to the function with a specific chunk of raw p-code. The decompilation of any calling function no longer shows the function call, but the chunk of p-code incorporates the called function's effects.
Call-fixups are more flexible than just inlining a function. The call-fixup chunk can be tailored to incorporate all of, just a part of, or something different to the behavior of the function.
Call-fixups are specified by name. The name and associated p-code chunk are typically defined in the compiler specification for the Program.
Ghidra records a Signature Source for every function, indicating the origin of its prototype information. This is similar to the Symbol Source attached to Ghidra's symbol annotations (See the documentation for Filtering in the Symbol Table). The possible types are:
Upon import of the Program, if there are debugging symbols available, Ghidra will build annotations of the function's parameters and set the Symbol Source type to IMPORTED. Otherwise, it will generally be set to DEFAULT.
However, Ghidra adjusts the Signature Source for a function if there is any change to the prototype. In particular, if the user adds, removes, or edits variable annotations for the function's parameters or return value, Ghidra automatically converts the Signature Source to be USER_DEFINED.
If the Signature Source is set to anything other than DEFAULT, the function's prototype information is forcing on the decompiler. See the discussion in “Forcing Data-types”
The input parameter and return value annotations of the function prototype, like any variable annotations, can be forcing on the decompiler. See the complete discussion in “Forcing Data-types”. But keep in mind:
![]() |
|
The input parameters and return value are all forced on the decompiler as a unit based on the Signature Source. They are all forced if the type is set to anything other than DEFAULT; otherwise none of them are forced. |
If the function prototype's annotations are not forced, the decompiler will attempt to discover the parameters and return value using the calling convention. The prototype model underlying the calling convention dictates which storage locations can be considered as parameters and their formal ordering.
If there are parameter or return value annotations that do not agree with the calling convention that has been set, the function prototype is said to be using custom storage. Using the Function Editor Dialog for instance, any storage location can be specified as a parameter, and a completely custom prototype can be built for the function.
The decompiler will disregard the calling convention's rules in this situation and use the custom storage locations for parameters and the return value. Other aspects of the calling convention, like the unaffected list, will still be used.
Mutability is a description of how values in a specific memory region (either a single variable or a larger block) can change during Program execution, based either on properties or established rules. Ghidra recognizes the mutability settings:
Mutability affects decompiler analysis and can have a large impact the output.
Most memory has normal mutability, meaning: the value at the memory location may change over the course of executing the Program, but for a given section of code, the value will not change unless an instruction explicitly writes to it.
Mutability can be set on an entire block of memory in the Program, typically from the Memory Map. It can also be set as part of a single Variable Annotation. From the Listing Window for instance, use the Settings dialog.
The constant mutability setting indicates that values within the memory region are read-only and don't change during Program execution. If a read-only variable is accessed in a function being analyzed by the decompiler, its constant value, if present in the Program's load image, replaces the variable within data-flow for the function. The decompiler may propagate the constant and fold it in to other operations, which can have a substantial impact on the final output.
The volatile mutability setting indicates that values within the memory region may change unexpectedly, even if the code currently executing does not directly write to it. If a volatile variable is accessed in a function being analyzed by the decompiler, each specific access is replaced with a built-in function call, which prevents constant propagation and other transforms across the access. The built-in functions are named based on whether the access is a read or write and then the size of the access. Within decompiler output, the first parameter to a built-in function is a symbol indicating the volatile variable. The function returns a value in the case of a volatile read or takes a second parameter in the case of a volatile write.
write_volatile_1(DAT_mem_002b,0x20); X = read_volatile_2(SREG);
Ghidra provides some ways to control how specific constants shown in disassembly are formatted or displayed. These annotations are attached to constants as operands of specific machine instructions. To the extent possible, the decompiler applies these annotations to the matching constant in the decompiler output. The constant may be transformed from its value in the original machine instruction during the decompiler's analysis. The decompiler will follow the constant through simple transformations, but if the transformed is too far from the original value, the annotation will not be applied. The transforms followed are:
Ghidra can create an association between a name and a constant, called an equate. The constant must be a constant operand of a specific machine instruction, and the equate is applied directly to the operand from the Listing Window using the Set Equate menu. Once applied, the equate's name is displayed instead of the numeric representation of the constant. Equates across the entire Program can be viewed from the Equate Table.
When analyzing a function, the decompiler attempts to follow any constant in the function with an attached equate to the matching constant in the final output. If successful, the equate's name is printed instead of the numeric form of the matching constant. If the constant was transformed from its original value, the matching constant is printed as an expression, where the transforming operations are applied to the equate symbol (representing the original constant).
Ghidra can apply a format conversion to override how an integer operand is displayed in a specific machine instruction. Conversions are generally applied from the Listing Window using the Convert menu option. When analyzing a function containing a machine instruction that has a format conversion applied, the decompiler will attempt to trace the constant to a matching constant in the final output. If successful, the format conversion is also applied to the matching constant.
Conversions applied by the decompiler are currently limited to:
An appropriate header matching the format is prepended to the representation string, either "0b", "0x" or just "0". The decompiler will not switch the signedness of the constant but preserves the signed or unsigned data-type as determined by analysis.
A register value in this context is a region of code in the Program where a specific register holds a known constant value. Ghidra maintains an explicit list of these values for the Program (see the documentation for Register Values), which the decompiler can use when analyzing a function. A register value benefits decompiler analysis, especially if the original compiler was aware of the constant value, as the decompiler can recover address references calculated as offsets relative to the register and otherwise propagate the constant.
A register value is set by highlighting the region of code in the Listing Window and then invoking the
Set Register Values ... command
from the pop-up menu. The beginning and end of a region is indicated in the Listing Window with
assume
directives, and regions can be generally viewed from the
Register Manager window.
In order for a particular register value to affect decompilation, the region of code associated with the value must contain the entry point of the function, and of course the function must read from the register. Only the initial reads of the register are replaced with the constant value. The decompiler will continue to respect later instructions that write to the register (even if the instruction is inside the register value's region) If a register value's region starts in the middle of a function, decompilation is not affected at all.
There is a special class of registers, called context registers whose values have a different affect on analysis and decompilation than described above.
![]() |
|
Context registers are inputs to the disassembly decoding process and directly affect which machine instructions are created. |
The value in a context register is examined when Ghidra decodes machine instructions from the underlying bytes in the Program. A specific value generally corresponds to a specific execution mode of the processor. The ARM processor T bit for instance, which selects whether the processor is executing ARM or THUMB instructions, is modeled as a context register in Ghidra. The same set of bytes in the Program can be decoded to machine instructions in more than one way, depending on context register values.
Bytes are typically decoded once using context register values established at the time of disassembly. From Ghidra's more static view of execution, a context register holds only a single value at any point in the code, but the same context register can hold different values for different regions of code. Setting a new value on a region of the Program will affect any subsequent disassembly of code within that region.
If a context register value is changed for a region that has already been disassembled, in order to see the affect of the change, the machine instructions in the region need to be cleared, and disassembly needs to be triggered again. See the documentation on the Clear Plugin.
Values for a context register are set in the same way as any other register, using the Set Register Values ... command described above. Within the Register Manager window, context registers are generally grouped together under the (pseudo-register) heading, contextreg. For details about how context registers are used in processor modeling, see the document "SLEIGH: A Language for Rapid Processor Specification".
Because context registers affect machine instructions, they also affect the underlying p-code and have a substantial impact on decompilation. Although details vary by processor, context register values are typically established during the initial import and analysis of a Program and aren't changed frequently.