Zlib压缩库压缩比率和压缩性能测试 (1)

xiaoxiao2021-02-28  87

Zlib1.2.6库是Jean-loup Gailly 和Mark Adler开发的压缩函数库,基于LZ77(Ziv, Jacob, and Abraham Lempel. “A universal algorithm for sequential data compression.” IEEE Transactions on information theory 23.3 (1977): 337-343),支持deflate格式 (RFC1951)、zlib格式(RFC1950)以及gzip格式(RFC1952)的内存数据压缩。


1. deflate格式:只有data域,适用于全部只有一种形式的压缩算法。

压缩函数:

1) 初始化压缩状态,关联相关的z_stream数据结构和压缩比例

typedef struct z_stream_s { z_const Bytef *next_in; /* next input byte */ uInt avail_in; /* number of bytes available at next_in */ uLongtotal_in; /* total number of input bytes read so far */ Bytef*next_out; /* next output byte should be put there */ uInt avail_out; /* remaining free space at next_out */ uLong total_out; /* total number of bytes output so far */ z_const char *msg; /* last error message, NULL if no error */ struct internal_state FAR *state; /* not visible by applications */ alloc_func zalloc; /* used to allocate the internal state */ free_func zfree; /* used to free the internal state */ voidpf opaque; /* private data object passed to zalloc and zfree */ int data_type; /* best guess about the data type: binary or text */ uLong adler; /* adler32 value of the uncompressed data */ uLong reserved; /* reserved for future use */ } z_stream; int deflateInit (z_streamp strm, int level);

z_streamp strm:z_stream数据结构

int level: 压缩比例,压缩级别是一个0-9的数字,0压缩速度最快(压缩的过程),9压缩速度最慢,压缩率最大,0不压缩数据

2) 压缩数据,填充关联的数据结构并存放到缓冲区

int deflate (z_streamp strm, int flush);

z_streamp strm: 关联的数据结构

int flush: 采用的缓冲存储方式

3) 压缩结束,释放数据结构对应的资源

int deflateEnd ((z_streamp strm));

解压缩函数:

1) 初始化解压缩状态例

int inflateInit (z_streamp strm);

2) 解压缩过程

int inflate (z_streamp strm, int flush);

3) 解压缩过程

int inflateEnd (z_streamp strm);

压缩解压缩函数使用参见example.c,其中example.c提供了三种函数使用,分别针对小字节数据,大字节数据和字典形数据。

void test_deflate(compr, comprLen) Byte *compr; uLong comprLen; { z_stream c_stream; /* compression stream */ int err; uLong len = (uLong)strlen(hello)+1; c_stream.zalloc = zalloc; c_stream.zfree = zfree; c_stream.opaque = (voidpf)0; err = deflateInit(&c_stream, Z_DEFAULT_COMPRESSION); CHECK_ERR(err, "deflateInit"); c_stream.next_in = (Bytef*)hello; c_stream.next_out = compr; while (c_stream.total_in != len && c_stream.total_out < comprLen) { c_stream.avail_in = c_stream.avail_out = 1; /* force small buffers */ err = deflate(&c_stream, Z_NO_FLUSH); CHECK_ERR(err, "deflate"); } /* Finish the stream, still forcing small buffers: */ for (;;) { c_stream.avail_out = 1; err = deflate(&c_stream, Z_FINISH); if (err == Z_STREAM_END) break; CHECK_ERR(err, "deflate"); } err = deflateEnd(&c_stream); CHECK_ERR(err, "deflateEnd"); } void test_inflate(compr, comprLen, uncompr, uncomprLen) Byte *compr, *uncompr; uLong comprLen, uncomprLen; { int err; z_stream d_stream; /* decompression stream */ strcpy((char*)uncompr, "garbage"); d_stream.zalloc = zalloc; d_stream.zfree = zfree; d_stream.opaque = (voidpf)0; d_stream.next_in = compr; d_stream.avail_in = 0; d_stream.next_out = uncompr; err = inflateInit(&d_stream); CHECK_ERR(err, "inflateInit"); while (d_stream.total_out < uncomprLen && d_stream.total_in < comprLen) { d_stream.avail_in = d_stream.avail_out = 1; /* force small buffers */ err = inflate(&d_stream, Z_NO_FLUSH); if (err == Z_STREAM_END) break; CHECK_ERR(err, "inflate"); } err = inflateEnd(&d_stream); CHECK_ERR(err, "inflateEnd"); if (strcmp((char*)uncompr, hello)) { fprintf(stderr, "bad inflate\n"); exit(1); } else { printf("inflate(): %s\n", (char *)uncompr); } } ... ... (大字节数据和字典型数据代码略~~)

本博文将对不同字节下的压缩比率和压缩性能进行比对,测试机器具体配置为:


Intel Xeon E312xx (Sandy Bridge) | SMP | x86_64 x86_64 x86_64 GNU/Linux


输入数据,构造如下:

#define alloc_size 10 int generate_data(char *data) { int i = 0; for(i = 0; i < alloc_size; ++i) { data[i]= i0 + '0'; //这个数字代表的不是一般情况,得根据使用场景构造 } data[alloc_size]='\0'; }

下面对字节数分别为10,100,1500分别进行100次测试,如下(时间为100次的耗时):

字节数压缩比率耗费时间1013/11~7500us100104/101~10000us1500119/1001~40000us

对于大字节数据压缩函数和字典压缩函数,本文暂不做研究。

2. zlib格式:基于deflate封装zlib头和zlib尾。

压缩函数:

**1) compress函数

int compress (Bytef *dest, uLongf *destLen, const Bytef *source, uLong sourceLen);

解压缩函数:

**1) uncompress函数

int uncompress (Bytef *dest, uLongf *destLen, const Bytef *source, uLong sourceLen);

参数意义:

Bytef *dest:输出缓冲区指针; uLongf *destLen:dest 缓冲区的大小; const Bytef *source:输入缓冲区指针; uLong sourceLen:输入数据大小。

压缩解压缩函数使用参见example.c中test_compress()函数,为了方便测试期间,本文对函数做了更改,如下:

void test_compress(Byte *compr, uLong comprLen, Byte *uncompr, uLong uncomprLen,Byte *data) { int err; uLong len = (uLong)strlen(data)+1; err = compress(compr, &comprLen, (const Bytef*)data, len); CHECK_ERR(err, "compress"); err = uncompress(uncompr, &uncomprLen, compr, comprLen); CHECK_ERR(err, "uncompress"); if (strcmp((char*)uncompr, data)) { fprintf(stderr, "bad uncompress\n"); exit(1); } else { printf("uncompress(): %s\n", (char *)uncompr); } }

其中,data由generate_data()函数生成。性能测试如下:

字节数压缩比率耗费时间1019/11~7000us101109/101~8500us1500124/1501~10000us

注:compress()函数采用系统默认压缩比例,采用compress2()函数可以控制压缩比例(0-9级别,级别越大,压缩比率越高,但是耗时越多)

int compress2 (Bytef *dest, uLongf *destLen, const Bytef *source, uLong sourceLen, int level);

在项目中,使用compress压缩,降低数据量较为显著。

3. gzip格式:基于deflate封装gzip头和gzip尾。

zlib库支持对gzip(.gz)格式文件的读写操作,操作接口以”gz”开头。也在test/minigzip.c中封装了gz_compress和gz_uncompress的压缩解压缩函数,压缩内存。暂不展开叙述。

转载请注明原文地址: https://www.6miu.com/read-69607.html

最新回复(0)