I’m working on understanding pointers in C by breaking down the language into three key areas: syntax, semantics, and idioms. To grasp pointers, I’ve been focusing on fundamental concepts: what’s an expression, what’s a value, what’s evaluation, and what’s assignment. Here’s my current understanding, followed by a specific question about the LHS in assignments.
Background:
An expression is something that needs to be evaluated, like 10 + (15 - 2 * 5), which step-by-step becomes 15.
A value is what evaluates to itself, e.g., ⟦15⟧ = 15
or ⟦<addr>⟧ = <addr>
(using ⟦x⟧ as shorthand for "x evaluates to").
Evaluation is reducing an expression to a value, following rules like: a function executes only when its arguments are values (e.g. if ⟦x⟧ = 15, sqrt(x + 1) → sqrt(15 + 1) → sqrt(16) → 4).
Assignment maps a name on the left to a value on the right, where the right side must evaluate to a value first (e.g., int a = 5 + 3 → int a = 8 → 8 and as as a side effect the value is stored). [side note: I remember reading that the 8 is the "side effect" of assignment, but it seems to me that storing the value in memory seems like the side effect]
For variables and pointers:
A variable on the right side evaluates to its stored value:
int a = 5; ⟦a⟧ → 5.
Pointers store addresses (values), and operators like & and * have evaluation rules: ⟦&a⟧ → <addr>
, ⟦*ptr⟧ → value.
This mental model worked well for simple examples:
int x = 70; // 70 is a value because it evaluates to itself ⟦70⟧ → 70
int* p = &x; // &x is not a value, we have to evaluate it. ⟦&x⟧ → <addr>
int a = *p + 1; // (*p + 1) is an expression and not a value
// evaluate it. + expects its operands to be values, so we evaluate *p.
// ⟦*p⟧ → 70 subsequently ⟦70 + 1⟧ → 71
// and finally, a is an alias for the value 71 which stored in memory.
The Problem: I’ve been treating assignment as "evaluate the right side to a value, fet about the left side", which worked well, but then I struggled with this example:
int arr[5] = {1, 2, 3, 4, 5};
int *ptr = arr;
*(ptr + 1) = *ptr; // becomes *(ptr + 1) = 1
To me, the assignment *(ptr + 1) = 1
made it looks as if we have to evaluate the left side! I failed to find a mental model that explains this statement and how it works.
So, in brief, the LHS (*(ptr + 1))
seems to require evaluation, unlike a simple variable. why does this happen, and how does it fit with evaluating expressions to values? I’d like an explanation aligning with my semantics model.
I’m working on understanding pointers in C by breaking down the language into three key areas: syntax, semantics, and idioms. To grasp pointers, I’ve been focusing on fundamental concepts: what’s an expression, what’s a value, what’s evaluation, and what’s assignment. Here’s my current understanding, followed by a specific question about the LHS in assignments.
Background:
An expression is something that needs to be evaluated, like 10 + (15 - 2 * 5), which step-by-step becomes 15.
A value is what evaluates to itself, e.g., ⟦15⟧ = 15
or ⟦<addr>⟧ = <addr>
(using ⟦x⟧ as shorthand for "x evaluates to").
Evaluation is reducing an expression to a value, following rules like: a function executes only when its arguments are values (e.g. if ⟦x⟧ = 15, sqrt(x + 1) → sqrt(15 + 1) → sqrt(16) → 4).
Assignment maps a name on the left to a value on the right, where the right side must evaluate to a value first (e.g., int a = 5 + 3 → int a = 8 → 8 and as as a side effect the value is stored). [side note: I remember reading that the 8 is the "side effect" of assignment, but it seems to me that storing the value in memory seems like the side effect]
For variables and pointers:
A variable on the right side evaluates to its stored value:
int a = 5; ⟦a⟧ → 5.
Pointers store addresses (values), and operators like & and * have evaluation rules: ⟦&a⟧ → <addr>
, ⟦*ptr⟧ → value.
This mental model worked well for simple examples:
int x = 70; // 70 is a value because it evaluates to itself ⟦70⟧ → 70
int* p = &x; // &x is not a value, we have to evaluate it. ⟦&x⟧ → <addr>
int a = *p + 1; // (*p + 1) is an expression and not a value
// evaluate it. + expects its operands to be values, so we evaluate *p.
// ⟦*p⟧ → 70 subsequently ⟦70 + 1⟧ → 71
// and finally, a is an alias for the value 71 which stored in memory.
The Problem: I’ve been treating assignment as "evaluate the right side to a value, fet about the left side", which worked well, but then I struggled with this example:
int arr[5] = {1, 2, 3, 4, 5};
int *ptr = arr;
*(ptr + 1) = *ptr; // becomes *(ptr + 1) = 1
To me, the assignment *(ptr + 1) = 1
made it looks as if we have to evaluate the left side! I failed to find a mental model that explains this statement and how it works.
So, in brief, the LHS (*(ptr + 1))
seems to require evaluation, unlike a simple variable. why does this happen, and how does it fit with evaluating expressions to values? I’d like an explanation aligning with my semantics model.
6 Answers
Reset to default 15Consider how you would execute a = 3+4
. It is easy to compute the value of 3+4
; you simply add 3 and 4, producing 7. Then you store 7 in a
. How do you do that? You must figure out where a
is. That is, you must determine which memory has been reserved for a
. Evaluating the left side of an =
operation determines where the value is to be stored.
C 2024 6.5.1 tells us:
An expression is a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof.
Note that a “sequence” of operators and operands includes trivial sequences that are just a single operand. In the declaration char c[4];
, there is just one expression, 4
. It specifies the value 4.
The sequence 3 + 4
specifies computation of the sum of the operands 3
and 4
, each of which is also an expression.
a
designates an object, and main
designates a function.
a = 3
generates the side effect of storing 3 in a
(and also computes the value 3).
In *(ptr + 1) = *ptr
, the left side is evaluated:
*
is an operator. Its operand(ptr + 1)
must be evaluated:( … )
is an operator with operandptr + 1
.( … )
merely evaluates its operand, produce the operand’s value as its result:+
is an operator with operandsptr
and1
:ptr
designates an object. Evaluating it produces the value in that object. (See “Lvalue conversion” below.)1
specifies the value 1.
- The
+
is completed by adding the value ofptr
and 1. This arithmetic is done in units of the pointed-to type,int
, so it produces the address of theint
one beyond whereptr
points.
( … )
is completed by producing the result of the+
.
*
is completed by producing the lvalue corresponding to the computed address.
Thus, the result of *(ptr + 1)
in *(ptr + 1) = *ptr
is an lvalue for one beyond where ptr
points. ptr
points to the first element of arr
, so *(ptr + 1)
is an lvalue for arr[1]
.
That is how the left side of the assignment is computed (in the abstract machine described by the C standard).
Then *(ptr + 1) = *ptr
stores the value of *ptr
in arr[1]
.
Lvalue conversion
Above, we saw that evaluation of ptr
produced the value in ptr
, but that did not happen for the lvalue *(ptr + 1)
; it was used to store a value instead. This is because of a general rule of expression evaluation in C 2024 6.3.3.1:
Except when it is the operand of the
sizeof
operator, or the typeof operators, the unary&
operator, the++
operator, the--
operator, or the left operand of the.
operator or an assignment operator, an lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue); this is called lvalue conversion.
The ptr
in ptr + 1
is not the operand of sizeof
, a typeof operator, unary &
, ++
, or --
, and it is not the left operand of .
or an assignment operator. So it is “converted” to the value stored in ptr
. This conversion is performed by loading the value from the memory reserved for ptr
(in the abstract machine).
*(ptr + 1)
is the left operand of =
, so it is not converted. It remains an lvalue, and the rules for =
in C 2024 6.5.17.1 say it is used to store the value of the right operand:
An assignment operator stores a value in the object designated by the left operand.
Array conversion
Notice that in int *ptr = arr;
, ptr
is initialized with the address of the first element in arr
, not with the value of arr
. This is because lvalue conversion does not happen for arrays. Instead, a different rule applies:
Except when it is the operand of the
sizeof
operator, or typeof operators, or the unary&
operator, or is a string literal used to initialize an array, an expression that has type “array of type” is converted to an expression with type “pointer to type” that points to the initial element of the array object and is not an lvalue.
Supplement
The term lvalue refers to its historic place on the left side of an assignment. In x = y
, x
and y
are both variables, but they are used very different. For y
, the value recorded in the variable is retrieved from memory. For x
, the value is stored in the variable. The term lvalue refers to an expression that may designate an object, so it can appear on the left side of an assignment. Whether the object’s value is read or written depends on how the lvalue is used in an expression.
If we ignore optimization, a compiler that is slavishly following the abstract model performs largely the same work for an address and an lvalue. ptr + 1
is the address of arr[1]
. *(ptr + 1)
is an lvalue for arr[1]
. Having an lvalue means we have what we need to access the object, which means we know its address. The data the program has for ptr + 1
is the same as the data it has for *(ptr + 1)
; it is the address of arr[1]
. The difference between them is the type of the expression and other metadata the compiler has about them. ptr + 1
has type int *
and is not an lvalue. *(ptr + 1)
has type int
and is an lvalue. The type of an expression and the fact of whether it is an lvalue or not tell the compiler how to treat the expression.
To me, the assignment
*(ptr + 1) = 1
made it looks as if we have to evaluate the left side!
That is absolutely correct. The left side has to be evaluated to determine which object is to be assigned to.
Regarding an expression, that term covers a broad range in C. It is defined in the C standard section 6.5p1 as follows:
An expression is a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof. The value computations of the operands of an operator are sequenced before the value computation of the result of the operator.
One consequence of this is that an assignment is just an expression that uses the assignment operator. For example, the following are valid expressions using assignment:
a = b = 1
x = (y = z + 1) + 4
3 + (x = 4) + 2
This works because the stored value in an assignment is the value of the assignment expression itself. This is spelled out in section 6.5.13p2 as follows:
An assignment operator stores a value in the object designated by the left operand. An assignment expression has the value of the left operand after the assignment
The important part here is that the left side of an assignment is an lvalue, i.e. an expression that designates an object. This is most commonly just the name of a variable, however a dereferenced pointer (i.e. the result of the unary *
operator) designates an object and is therefore an lvalue as well.
So given your example:
int arr[5] = {1, 2, 3, 4, 5};
int *ptr = arr;
*(ptr + 1) = *ptr;
The value 1 is added to the pointer ptr
. This results in a pointer to the next object of type int
, namely the 2nd element of arr
. That pointer is then dereferenced to yield an lvalue for the 2nd element of arr
, and it is that element that the value *ptr
is assigned to.
The last statement can also be written as follows:
ptr[1] = *ptr;
Which probably makes it more clear what it's doing. As it turns out array indexing is just shorthand for pointer addition and dereference. In other words, *(E1 + E2)
is exactly equivalent to E1[E2]
.
First of all, "value" isn't "something that evaluates to itself". A value is a meaning associated with the bits representing the object. E.g. if you have int x = 42;
, the bits in x
represent the number 42, so its value is the number 42.
A literal number in the source code, such as 42
in the example above, is called a "literal" or a "constant" (in C++ and C respectively).
The result of evaluation isn't just a value. It can have an address associated with it (if it's an lvalue, which is a compile-time property of expressions). E.g. x
and 42
have the same value, but x
has an address associated with it (&x
compiles) and is an lvalue, while 42
doesn't have an associated address (&42
doesn't compile) and is an rvalue.
This associated address is unused in some cases (not needed for arithmetic for example), and is required in others (for &
, =
, etc).
Assignment maps a name on the left to a value on the right
Assignment is a command that tells the computer to modify an object (simply speaking) (i.e. change the bits representing it).
Don't confuse it with equality in math. The variable can be later set to a different value.
a = 10;
a = 20;
makes perfect sense in C/C++ but would be a contradiction in math.
Assignment maps a name on the left to a value on the right
This is compatible with the rules of C, so long as you have a sufficiently expansive definition of "name", and note that your term "value" isn't the same as what the C standard terms "value". Using standard terminology, it would be
Assignment maps an lvalue on the left to an rvalue on the right
So to match your terminology, anything that denotes an object is a name, and anything that has no more subexpressions is a value. If, when trying to evaluate a value, you think you are finished, but have a name, then you copy the named object as your value.
With that, you can have a simple mental model of pointers.
- Dereferencing a pointer names the pointed-to object.
- Taking the address of a name creates a pointer value.
- Adding
N
to a pointer creates a pointer value that is pointing to another object that isN
away.
The important thing to understand is that C has two kinds of values -- lvalues and rvalues. When an expression is evaluated, which kind of value it is evaluated as depends on the context
on the left side of an assignment, or as the operand of a unary
&
operator, an expression will be evaluated as an lvalue, and the result will be a location.on the right side of an assignment, or as an operand of most1 any operator, an expression will be evaluated as an rvalue and the result will be a "value" as you describe it. Whenever someone says "value" when talking about C2, they can be understood as talking about an rvalue.
Conceptually, a "value" is a bunch of bits (data), along with a type that determines how those bits should be interpreted, while a "location" is a place that can store a bunch of bits, along with a type that determines how those bits should be interpreted.
So when an assignment is evaluated, the left side is evaluated as an lvalue, while the right side is evaluated as an rvalue, and then the value is stored into the location. Those two evaluations are not sequenced -- they may happen in either order, or may even happen simultaneously, so any side effects in either may occur in any order or may even "collide" resulting in undefined behavior.
With the unary &
and *
operators, similar things occur
- unary
&
in an rvalue context will evaluate its operand in an lvalue context, and then give the address of the location. - unary
*
in an rvalue context will evaluate its operand in an rvalue context, then fetch the data from the address. - unary
*
in an lvalue context will evaluate its operand in an rvalue context, and then give the location corresponding to the address.
Note that a "location" and an "address" are not the same thing -- not all locations have addresses, though all (valid) addresses denote a location.
1Besides the unary &
, there are also things like sizeof
which evaluate their operand as an "unevaluated" expression, which is neither an lvalue nor an rvalue.
2This is not the case when talking about C++, which has its own meaning of "rvalue", which is quite different from C
Applying the Refined Model to *(ptr + 1) = *ptr;
Let's break it down with int arr[5] = {1, 2, 3, 4, 5}; and int *ptr = arr; (so ptr points to arr[0]).
Evaluate RHS (*ptr):
The expression *ptr is on the RHS, so we need its rvalue.
First, evaluate ptr. ptr is a variable (an lvalue). In an rvalue context, it undergoes lvalue conversion. ⟦ptr⟧ → <address of arr[0]>.
Now evaluate *<address of arr[0]>. The * operator (dereference) reads the value at the given address.
⟦*ptr⟧ → 1 (the value stored in arr[0]). So, value_R is 1.
Evaluate LHS (*(ptr + 1)):
The expression *(ptr + 1) is on the LHS, so we need to find the location it designates (an lvalue).
First, evaluate the expression inside the parentheses: ptr + 1.
ptr evaluates to its value: ⟦ptr⟧ → <address of arr[0]>.
1 evaluates to 1.
Pointer arithmetic: ⟦<address of arr[0]> + 1⟧ → <address of arr[1]>. (The address is incremented by 1 * sizeof(int)).
Now evaluate the * operator applied to this address, in an lvalue context. The expression *<address of arr[1]> designates the memory location at that address.
⟦*(ptr + 1)⟧ → location corresponding to arr[1]. So, location_L is the memory slot for arr[1].
Store:
Store value_R (which is 1) into location_L (the memory for arr[1]).
The effect is that arr[1] now contains the value 1. The array becomes {1, 1, 3, 4, 5}.
Your original model was mostly correct but incomplete regarding the LHS of assignment. The LHS isn't ignored; it is evaluated, but its evaluation yields a location (lvalue), not necessarily a data value like the RHS. Expressions like *p or arr[i] or *(ptr + 1) can be lvalues – they designate specific, modifiable memory locations. Evaluating them on the LHS means figuring out which location they designate, potentially involving calculations like pointer arithmetic.
Think of it this way:
RHS Evaluation: "What is the value?"
LHS Evaluation: "What is the destination address/location?"
=
just like/
or any other binary operator: it has two operands which must both be evaluated. The only difference is that the left operand must evaluate to an lvalue. – Nate Eldredge Commented Mar 26 at 22:01arr[1] = 5;
requires evaluatingarr[1]
to get the array element that needs to be updated. – Barmar Commented Mar 26 at 22:040xAABBCC
to0xAABBCF
. stackoverflow/questions/26129586/l-value-vs-r-value-in-c – jabroni Commented Mar 26 at 22:31