Erlang二进制创建的内部机制和优化(一)

xiaoxiao2021-03-01  11

《Erlang Binary的内部结构和分类介绍》一文是本文的基础,接下来要探讨的是构建Binary时,什么情景下才能充分发挥Erlang运行时系统对二进制创建做所做的优化特性。 下面是引用官方文档中的一个例子,并加予C源码进一步阐述二进制创建的内部机制。 Bin0 = <<0>>, %% 1 Bin1 = <<Bin0/binary,1,2,3>>, %% 2 Bin2 = <<Bin1/binary,4,5,6>>, %% 3 Bin3 = <<Bin2/binary,7,8,9>>, %% 4 Bin4 = <<Bin1/binary,17>>, %% 5 !!! {Bin4,Bin3} %% 6 在第一行,系统创建了一个堆二进制(heap binary)。 在《Erlang Binary的内部结构和分类介绍》已经提到,堆二进制被直接存储到进程堆里,最大为64字节,如果大于64字节,引用计数二进制(refc binary)将会被创建。 第二行属于二进制的append操作,调用了erl_bits.c中的erts_bs_appen函数,C源码及注解如下: Eterm erts_bs_append(Process* c_p, Eterm* reg, Uint live, Eterm build_size_term, Uint extra_words, Uint unit) { Eterm bin; /* Given binary */ Eterm* ptr; Eterm hdr; ErlSubBin* sb; ProcBin* pb; Binary* binp; Uint heap_need; Uint build_size_in_bits; Uint used_size_in_bits; Uint unsigned_bits; ERL_BITS_DEFINE_STATEP(c_p); // 需要创建的二进制的位数: build_size_in_bits if (is_small(build_size_term)) { Sint signed_bits = signed_val(build_size_term); if (signed_bits < 0) { goto badarg; } build_size_in_bits = (Uint) signed_bits; } else if (term_to_Uint(build_size_term, &unsigned_bits)) { build_size_in_bits = unsigned_bits; } else { c_p->freason = unsigned_bits; return THE_NON_VALUE; } bin = reg[live]; if (!is_boxed(bin)) { badarg: c_p->freason = BADARG; return THE_NON_VALUE; } ptr = boxed_val(bin); // 取出二进制数据流中的header hdr = *ptr; if (!is_binary_header(hdr)) { goto badarg; } // #MARK_A if (hdr != HEADER_SUB_BIN) { // 非子二进制,不可写 goto not_writable; } sb = (ErlSubBin *) ptr; if (!sb->is_writable) { // is_writable==0,不可写 goto not_writable; } pb = (ProcBin *) boxed_val(sb->orig); // 必须是refc binary ASSERT(pb->thing_word == HEADER_PROC_BIN); if ((pb->flags & PB_IS_WRITABLE) == 0) { // 标明了不可写 goto not_writable; } /* * OK, the binary is writable. */ erts_bin_offset = 8*sb->size + sb->bitsize; if (unit > 1) { if ((unit == 8 && (erts_bin_offset & 7) != 0) || (erts_bin_offset % unit) != 0) { goto badarg; } } used_size_in_bits = erts_bin_offset + build_size_in_bits; // 原来的sub binary设为以后不可写,因为后继空间将要被写入数据 // #MARK_B sb->is_writable = 0; /* Make sure that no one else can write. */ // 扩展到所需大小 pb->size = NBYTES(used_size_in_bits); pb->flags |= PB_ACTIVE_WRITER; /* * Reallocate the binary if it is too small. */ binp = pb->val; // 如果容器的空间不足,则重新分配容器大小到所需的二倍 if (binp->orig_size < pb->size) { Uint new_size = 2*pb->size; binp = erts_bin_realloc(binp, new_size); binp->orig_size = new_size; // 注意:重新分配空间以后,pb->val指针会被改变, // 所以此处的binary不能被外部引用 pb->val = binp; pb->bytes = (byte *) binp->orig_bytes; } erts_current_bin = pb->bytes; /* * Allocate heap space and build a new sub binary. */ reg[live] = sb->orig; heap_need = ERL_SUB_BIN_SIZE + extra_words; if (c_p->stop - c_p->htop < heap_need) { (void) erts_garbage_collect(c_p, heap_need, reg, live+1); } // 创建一个新的sub binary,指向原二进制的开头, // 相比原来的sub binary,这里只是把空间大小扩展到所需值 sb = (ErlSubBin *) c_p->htop; // 从堆顶写入 // 进程堆顶上升ERL_SUB_BIN_SIZE(20)字节 c_p->htop += ERL_SUB_BIN_SIZE; sb->thing_word = HEADER_SUB_BIN; sb->size = BYTE_OFFSET(used_size_in_bits); sb->bitsize = BIT_OFFSET(used_size_in_bits); sb->offs = 0; sb->bitoffs = 0; // 最新的sub binary,设为可写 // 也就是说,在一系列的append操作中,只有最后一个sub binary是可写的 sb->is_writable = 1; sb->orig = reg[live]; return make_binary(sb); /* * The binary is not writable. We must create a new writable binary and * copy the old contents of the binary. */ not_writable: { Uint used_size_in_bytes; /* Size of old binary + data to be built */ Uint bin_size; Binary* bptr; byte* src_bytes; Uint bitoffs; Uint bitsize; Eterm* hp; /* * Allocate heap space. */ heap_need = PROC_BIN_SIZE + ERL_SUB_BIN_SIZE + extra_words; if (c_p->stop - c_p->htop < heap_need) { (void) erts_garbage_collect(c_p, heap_need, reg, live+1); bin = reg[live]; } hp = c_p->htop; /* * Calculate sizes. The size of the new binary, is the sum of the * build size and the size of the old binary. Allow some room * for growing. */ ERTS_GET_BINARY_BYTES(bin, src_bytes, bitoffs, bitsize); erts_bin_offset = 8*binary_size(bin) + bitsize; if (unit > 1) { if ((unit == 8 && (erts_bin_offset & 7) != 0) || (erts_bin_offset % unit) != 0) { goto badarg; } } used_size_in_bits = erts_bin_offset + build_size_in_bits; used_size_in_bytes = NBYTES(used_size_in_bits); bin_size = 2*used_size_in_bytes; // 至少256字节 bin_size = (bin_size < 256) ? 256 : bin_size; /* * Allocate the binary data struct itself. */ // 创建大小为所需空间的二倍的binary(最小值为256字节), // 它作为一个容器,存储在进程堆以外, // 进程堆里只存放引用这个binary的refc binary bptr = erts_bin_nrml_alloc(bin_size); bptr->flags = 0; bptr->orig_size = bin_size; erts_refc_init(&bptr->refc, 1); erts_current_bin = (byte *) bptr->orig_bytes; /* * Now allocate the ProcBin on the heap. */ // 创建refc binary,引用上面的binary, 并存储到进程堆 pb = (ProcBin *) hp; hp += PROC_BIN_SIZE; pb->thing_word = HEADER_PROC_BIN; // 当前设置为实际所需的大小,以后的append操作可扩展 pb->size = used_size_in_bytes; pb->next = MSO(c_p).first; MSO(c_p).first = (struct erl_off_heap_header*)pb; pb->val = bptr; pb->bytes = (byte*) bptr->orig_bytes; pb->flags = PB_IS_WRITABLE | PB_ACTIVE_WRITER; OH_OVERHEAD(&(MSO(c_p)), pb->size / sizeof(Eterm)); /* * Now allocate the sub binary and set its size to include the * data about to be built. */ // 创建sub binary,引用上面的refc binary,并设置为所需大小 sb = (ErlSubBin *) hp; hp += ERL_SUB_BIN_SIZE; sb->thing_word = HEADER_SUB_BIN; sb->size = BYTE_OFFSET(used_size_in_bits); sb->bitsize = BIT_OFFSET(used_size_in_bits); sb->offs = 0; sb->bitoffs = 0; sb->is_writable = 1; sb->orig = make_binary(pb); c_p->htop = hp; /* * Now copy the data into the binary. */ copy_binary_to_buffer(erts_current_bin, 0, src_bytes, bitoffs, erts_bin_offset); return make_binary(sb); } } 从上面代码#MARK_A处可以看到,如果不是子二进制(sub binary)就跳到not_writable,然后创建所需要的容器、refc binary和sub binary,并拷贝Bin0的内容(详细请看not_writable部分中的注释),为append做准备。 Bin0 = <<0>>, %% 1 Bin1 = <<Bin0/binary,1,2,3>>, %% 2 Bin2 = <<Bin1/binary,4,5,6>>, %% 3 Bin3 = <<Bin2/binary,7,8,9>>, %% 4 Bin4 = <<Bin1/binary,17>>, %% 5 !!! {Bin4,Bin3} %% 6在第三行,由于Bin1是最后一个执行过append操作的,它的后继空间是自由的,是可被扩展的,而且,Bin1不可能再被改变, 所以Bin1不会被复制,只是在Bin1的后面依次追加1、2、3, 第四行的执行过程和第三行一样。

在第五行,是往Bin1后面追加数据,而不是Bin3。由于Bin1已经不是最后被执行过append操作的数据,即Bin1的后继空间已经有别的数据存在(此处Bin1后面已经保存了4,5,6,7,8,9)。所以执行过程不会和上面两行一样。在这里将会创建新的sub binary并拷贝Bin1,然后在它的后面追加17。

我们是怎么知道它后面不能再追加数据?文档中也有这么一个问题: We will not explain here how the run-time system can know that it is not allowed to write into Bin1; it is left as an exercise to the curious reader to figure out how it is done by reading the emulator sources, primarily erl_bits.c.

这个问题的答案,上面append函数中可以找到。其实在执行第三行时,Bin1已被设置为不可写(参见#MARK_B处)。

Erlang二进制创建的内部机制和优化(二)

转载请注明原文地址: https://www.6miu.com/read-3650015.html

最新回复(0)