Briefly talk about the changes of zval in PHP5 and PHP7

In this article, I try to use popular words to describe the difference between the two zval. The summary is at the end.

Official start

Structure of zval:

  1. type
  2. value
  3. refcount: this parameter is used several times (including references). The garbage collection mechanism also judges this value, int
  4. is_ref: whether it is referenced, true|false
<?php
	$a = 1;
	xdebug_debug_zval('a');
?>

	stay PHP5 The following will be printed:
	a: (refcount=1, is_ref=0)=1

	PHP7: 
	a: (refcount=0, is_ref=0)int 1

In this example, we can see that PHP7 does not count int type 1. We can see that this is one of the changes of zval.
PHP5 scalar data type accounting number, PHP7 scalar data type (Boolean, integer, string, floating point) is no longer counted, and there is no need to allocate memory separately.

<?php
	$a = "new string";
	xdebug_debug_zval('a');
?>

	PHP5: 
		a: (refcount=1, is_ref=0)='new string'

	PHP7: 
		a: (refcount=1, is_ref=0)string 'new string' (length=10)
<?php
	$a = "new string";
	$b = $a;
	xdebug_debug_zval( 'a' );
?>

	PHP5: 
		a: (refcount=2, is_ref=0)='new string'

	PHP7: 
		a: (refcount=1, is_ref=0)string 'new string' (length=10)

It will be found here that after repeated assignment under PHP7, the count does not have + 1.
Because the string is also a scalar, it will not be recorded multiple times.

<?php
	$a = "new string";
	$c = $b = $a;
	xdebug_debug_zval( 'a' );
	unset( $a, $b );
	xdebug_debug_zval( 'c' );
?>

	PHP5: 
		a: (refcount=3, is_ref=0)='new string'
		c: (refcount=1, is_ref=0)='new string'

	PHP7: 
		a: (refcount=1, is_ref=0)string 'new string' (length=10)
		c: (refcount=1, is_ref=0)string 'new string' (length=10)

Here, after unset($a), the variable c will be copied and separated.

<?php
	$a = array( 'meaning' => 'life', 'number' => 42 );
	xdebug_debug_zval( 'a' );
	$a['test']='abc';
	xdebug_debug_zval( 'a' );
?>

	PHP5: 
		a: (refcount=1, is_ref=0)=array (
		   'meaning' => (refcount=1, is_ref=0)='life',
		   'number' => (refcount=1, is_ref=0)=42
		)
		a: (refcount=1, is_ref=0)=array (
		   'meaning' => (refcount=1, is_ref=0)='life',
		   'number' => (refcount=1, is_ref=0)=42,
		   'test' => (refcount=1, is_ref=0)='abc',
		)

	PHP7: 
		a: (refcount=2, is_ref=0)
		array (size=2)
		  'meaning' => (refcount=1, is_ref=0)string 'life' (length=4)
		  'number' => (refcount=0, is_ref=0)int 42
		a: (refcount=1, is_ref=0)
		array (size=3)
		  'meaning' => (refcount=2, is_ref=0)string 'life' (length=4)
		  'number' => (refcount=0, is_ref=0)int 42
		  'test' => (refcount=1, is_ref=0)string 'abc' (length=3)

Here you will find that the refcount of the initial array under PHP7 is 2, while the refcount of the changed array becomes 1. I found the original words of the members of the development team here:

For arrays the not-refcounted variant is called an "immutable array". If you use opcache, then constant array literals in your code will be converted into immutable arrays. Once again, these live in shared memory and as such must not use refcounting. Immutable arrays have a dummy refcount of 2, as it allows us to optimize certain separation paths.

<?php
	$a = array( 'meaning' => 'life', 'number' => 42 );
	xdebug_debug_zval( 'a' );
	$a['life'] = $a['meaning'];
	xdebug_debug_zval( 'a' );
	$a['house'] = $a['meaning'];
	xdebug_debug_zval( 'a' );
	unset($a['meaning'], $a['life']);
	xdebug_debug_zval( 'a' );
?>

	PHP5: 
		a: (refcount=1, is_ref=0)=array (
		   'meaning' => (refcount=1, is_ref=0)='life',
		   'number' => (refcount=1, is_ref=0)=42,
		)
		a: (refcount=1, is_ref=0)=array (
		   'meaning' => (refcount=2, is_ref=0)='life',
		   'number' => (refcount=1, is_ref=0)=42,
		   'life' => (refcount=2, is_ref=0)='life'
		)
		a: (refcount=1, is_ref=0)=array (
		   'meaning' => (refcount=3, is_ref=0)='life',
		   'number' => (refcount=1, is_ref=0)=42,
		   'life' => (refcount=3, is_ref=0)='life',
		   'house' => (refcount=3, is_ref=0)='life',
		)
		a: (refcount=1, is_ref=0)=array (
		   'number' => (refcount=1, is_ref=0)=42,
		   'house' => (refcount=1, is_ref=0)='life'
		)

	PHP7: 
		a: (refcount=2, is_ref=0)
		array (size=2)
		  'meaning' => (refcount=1, is_ref=0)string 'life' (length=4)
		  'number' => (refcount=0, is_ref=0)int 42
		  
		a: (refcount=1, is_ref=0)
		array (size=3)
		  'meaning' => (refcount=3, is_ref=0)string 'life' (length=4)
		  'number' => (refcount=0, is_ref=0)int 42
		  'life' => (refcount=3, is_ref=0)string 'life' (length=4)
		  
		a: (refcount=1, is_ref=0)
		array (size=4)
		  'meaning' => (refcount=4, is_ref=0)string 'life' (length=4)
		  'number' => (refcount=0, is_ref=0)int 42
		  'life' => (refcount=4, is_ref=0)string 'life' (length=4)
		  'house' => (refcount=4, is_ref=0)string 'life' (length=4)
		
		a: (refcount=1, is_ref=0)
		array (size=2)
		  'number' => (refcount=0, is_ref=0)int 42
		  'house' => (refcount=2, is_ref=0)string 'life' (length=4)

A new difference will be found here. When the string in the array is reused for the first time, refcount will be + 2.
The reason and significance have not been found here. We will make up for it after further understanding.

<?php
	$a = array( 'one' );
	$a[] =& $a;
	xdebug_debug_zval( 'a' );
	unset($a);
	xdebug_debug_zval( 'a' );
?>

	PHP5: 
		a: (refcount=2, is_ref=1)=array (
		   0 => (refcount=1, is_ref=0)='one',
		   1 => (refcount=2, is_ref=1)=&array
		)
		a: no such symbol

	PHP7: 
		a: (refcount=2, is_ref=1)
		array (size=2)
		  0 => (refcount=2, is_ref=0)string 'one' (length=3)
		  1 => (refcount=2, is_ref=1)
		    &array<
		a: no such symbol

There is a digression here, which is the GC (garbage collection) mechanism. At PHP5 Version 3 and later optimized GC.
In the above example, when circular reference is used, it will be as shown in the following figure:

When unset($a), it becomes:

At this time, you will find that the result referenced here does not have a variable to point to it, so you can't kill it when releasing memory, which becomes the so-called garbage and causes memory overflow.
Sometimes you will find that PHP FPM takes up a very large amount of memory, only increasing but not decreasing. The reason is this memory overflow.
At this time, this value will be thrown into the garbage collector. When the overflow exceeds a certain threshold, the garbage collector in GC will deeply clean up these residues.
So how to judge whether it is garbage:

  1. If the refcount of a zval increases, the zval is still in use. It is definitely not garbage and will not enter the garbage collector (buffer)
  2. If the refcount of a zval is reduced to 0, then the zval will be released immediately. It does not belong to the garbage object to be processed by the GC and will not enter the buffer.
  3. If the refcount of a zval is greater than 0 after reduction, the zval cannot be released. The zval may become garbage and put it into the buffer.

doubt

In the whole article, there will be a question: how to clean up the garbage without counting scalar data types?

Here is my understanding:
In C language, the information generated by the system is stored in the stack, and the artificially generated information is put in the heap.
PHP7 has an update, that is, it stores some variables in the stack. I guess these scalars have simple and compact data and can be faster and more efficient.

summary

PHP7 changes to zval:

  1. refcount is stored in a different location, from zval global to zval Value itself. The advantage is that it can do + 1-1 operation faster.
  2. PHP7 stores some variables in the stack.
  3. PHP7 scalar data types (Boolean, integer, string, floating point) are no longer counted and do not need to allocate memory separately.

Meaning party

  • Q:
    What's the point of knowing this?
  • A:
    First of all, we should talk about meaning and establish our own programming thinking. Will also explore their own unknown areas, can broaden their horizons, so that they are no longer ambitious. Can it also prevent Alzheimer's disease?
    Secondly, from the perspective of programmers, 99% of Companies in the world are business-oriented, while 1% are technology-based companies.
    In 1%, first of all, you need a strong education and ability to be very thorough in a certain field, that is, depth.
    In 99%, you rely on business income to make a meal. No matter how thorough you are about a language, you can't bring benefits to the company. At this time, the significance here is not too great. What you have to do is to learn to use more wheels, such as Mongodb, Hadoop, kafka, k8s, Spinx, etc. This is breadth.
    As for which way you decide to go, you have to consider your own situation.

reference

  1. Uncle Pangu - garbage collection
  2. Uncle bird - deeply understand the zval of PHP7 kernel
  3. Ends of the earth - php core function xdebug_debug_zval

If you have any comments or suggestions, please send me a private letter. Welcome to communicate

Tags: data structure php7 xdebug

Posted by exploo on Wed, 25 May 2022 11:04:37 +0300