CN101932012B

CN101932012B - Method for compressing sensor network data based on optimal order estimation and distributed clustering

Info

Publication number: CN101932012B
Application number: CN2010102380539A
Authority: CN
Inventors: 蒋鹏; 李胜强
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2010-07-27
Filing date: 2010-07-27
Publication date: 2013-09-18
Anticipated expiration: 2030-07-27
Also published as: CN101932012A

Abstract

The invention relates to a sensor network data compression method based on optimal order estimation and distributed clustering. Existing data compression methods are inefficient. The present invention utilizes the time-space correlation of data collected by sensor nodes, on the one hand, by introducing optimal order estimation, thereby defining the number of original data groups to be transmitted by the system, not only obtaining effective correlation coefficients, but also avoiding redundant coefficients On the other hand, the network is divided into clusters, and the node data is processed in units of clusters. This can not only improve the efficiency of the base station in processing the data of each node, but also enhance the ability of the base station to quickly locate outliers or abnormal nodes. The invention is applicable to the environment real-time monitoring system based on the wireless sensor network, can realize the effective compression of the wireless sensor network data, and effectively reduce the average energy consumption of nodes.

Description

A Data Compression Method for Sensor Networks Based on Optimal Order Estimation and Distributed Clustering

技术领域 technical field

本发明属于数据压缩技术领域，涉及一种基于最优阶估计与分布式分簇的传感器网络数据压缩方法。The invention belongs to the technical field of data compression, and relates to a sensor network data compression method based on optimal order estimation and distributed clustering.

背景技术 Background technique

基于无线传感器网络(WSNs)的监测系统中，各个传感器节点收集自身周围的局部信息，对其进行处理后传送至汇聚节点，汇聚节点汇总所有节点采集的局部数据得到感兴趣区域的整体信息。在无线传感器网络中，由于受多种因素比如背景噪音、节点失效、无线通信存在不稳定性及能量约束等的影响，节点获取、处理和传输的感知数据信息常常存在一定的误差，并具有一定程度的不确定性，然而在某些应用中通常允许一定的误差存在。即在保证应用要求的前提下，可以通过降低一定的数据精度来减少在网络中传输的数据量，从而降低网络中节点的能量消耗。无线传感网络中的数据压缩算法就是保证在一定数据精度的前提下，寻求一种有效的减少传输中数据量，从而降低节点能耗，提高整个网络综合性能的方法。在无线传感器网络的许多实际应用中，被监测区异常情况发生的概率总是相对较小，在没有异常发生的情况下，同一传感节点在连续采集数据时，前后连续时刻所采集数据必然存在很大相关性，同时，处于相邻区域的不同的传感器节点在同一时刻采集的数据必然具有空间相关性，如果将这些具有时间，空间冗余的数据都发送到基站必然耗费节点大量的能量，因此如何有效地消除节点感知数据在时间、空间上的冗余性已成为无线传感网络中数据压缩要解决的关键性问题。如何针对WSNs在环境监控方面的具体应用需要，设计出有效的无线传感网络数据压缩算法，是一个很有意义的研究课题。In a monitoring system based on wireless sensor networks (WSNs), each sensor node collects local information around itself, processes it and transmits it to the sink node, and the sink node summarizes the local data collected by all nodes to obtain the overall information of the region of interest. In a wireless sensor network, due to the influence of various factors such as background noise, node failure, wireless communication instability and energy constraints, the sensory data information acquired, processed and transmitted by nodes often has certain errors and certain errors. degree of uncertainty, however in some applications a certain amount of error is usually allowed. That is, under the premise of ensuring the application requirements, the amount of data transmitted in the network can be reduced by reducing a certain data accuracy, thereby reducing the energy consumption of nodes in the network. The data compression algorithm in the wireless sensor network is to seek a method to effectively reduce the amount of data in transmission, thereby reducing the energy consumption of nodes and improving the overall performance of the entire network under the premise of ensuring a certain data accuracy. In many practical applications of wireless sensor networks, the probability of anomalies in the monitored area is always relatively small. In the absence of anomalies, when the same sensor node collects data continuously, the data collected at successive moments must exist. At the same time, the data collected by different sensor nodes in adjacent areas at the same time must have spatial correlation. If these data with time and space redundancy are sent to the base station, it will consume a lot of energy for the nodes. Therefore, how to effectively eliminate the redundancy of node perception data in time and space has become a key problem to be solved in data compression in wireless sensor networks. How to design an effective wireless sensor network data compression algorithm for the specific application needs of WSNs in environmental monitoring is a very meaningful research topic.

无线传感器网络数据压缩算法利用节点感知数据中存在的时间、空间冗余，先直接传输若干组节点感知数据，从中提取同一节点不同时刻，同一时刻不同节点间的相关性，再结合分布式编码，在基站处恢复节点原始感知数据，节点处仅有简单的取模运算，大量算法运算被转移到基站，这样就能够在保证一定精度的情况下，最大限度地减小节点传输单位数据的能耗，而实际应用中仅对基站定期更换电池或直接长期供电也是可行的，故该种算法具有较广的应用前景。然而该类算法还存在一些缺点，比如在估计预测相关性系数时，对所要传输的原始数据的组数没有清晰的界定，若让节点传输过多原始数据，势必会耗费节点大量的能量，这样不仅不利于延长节点使用寿命，还会导致基站处要计算的相关性系数维数增大，即引入了冗余相关系数，从而使基站计算量过大，进而影响到系统的响应时间及数据精度；若节点传输的原始数据组数太少，基站无法计算出足够的相关性系数，则预测出的数据将很难满足精度要求，导致系统可用性降低，故如何有效地界定要传输的原始数据量是该算法值得深入探讨和解决的问题。另外，随着网络规模的增大，该种算法的多节点对应单一基站的模式就很难高效运行了，因为基站要接收各节点压缩编码并根据相关性系数逐一计算出各节点原始数据，这样在计算首个节点与最后一个节点数据时，必然存在一个较大时间差，故网络规模越大，时间差越大，这就增大了系统时延，影响了该算法的应用范围，基于这两点，最优阶分布式分簇结构树压缩算法一方面引入最优阶估计，从而界定系统要传输的原始数据组数，既要获得有效的相关性系数，又要避免冗余系数的产生，另一方面对网络进行分簇处理，以簇为单位处理节点数据，这样不仅可以提高基站处理各节点数据效率，还可以增强基站迅速定位产生异值或出现异常节点的能力，从而有效提高整个无线传感网络系统对监控区的综合监控能力。The wireless sensor network data compression algorithm uses the time and space redundancy in the node perception data, first directly transmits several groups of node perception data, extracts the correlation between different nodes at different times of the same node, and then combines distributed coding, Restore the original perception data of the node at the base station. There is only a simple modulo operation at the node, and a large number of algorithm operations are transferred to the base station. This can minimize the energy consumption of the unit data transmitted by the node while ensuring a certain accuracy. , but in practical application, it is feasible to replace the battery regularly or provide direct long-term power supply to the base station, so this algorithm has a broad application prospect. However, this type of algorithm still has some shortcomings. For example, when estimating the predicted correlation coefficient, there is no clear definition of the number of groups of original data to be transmitted. If the node is allowed to transmit too much original data, it will inevitably consume a lot of energy of the node. Not only is it not conducive to prolonging the service life of the node, but it will also increase the dimension of the correlation coefficient to be calculated at the base station, that is, the introduction of redundant correlation coefficients, so that the calculation amount of the base station is too large, which in turn affects the response time and data accuracy of the system ; If the number of original data groups transmitted by the node is too small, the base station cannot calculate enough correlation coefficients, and the predicted data will hardly meet the accuracy requirements, resulting in reduced system availability. Therefore, how to effectively define the amount of original data to be transmitted It is a problem worthy of in-depth discussion and solution of the algorithm. In addition, as the scale of the network increases, it is difficult for this algorithm to operate efficiently in the mode of multiple nodes corresponding to a single base station, because the base station needs to receive the compressed codes of each node and calculate the original data of each node one by one according to the correlation coefficient. When calculating the data of the first node and the last node, there must be a large time difference, so the larger the network scale, the greater the time difference, which increases the system delay and affects the application range of the algorithm. Based on these two points , the optimal order distributed clustering tree compression algorithm introduces optimal order estimation on the one hand, so as to define the number of original data groups to be transmitted by the system, not only to obtain effective correlation coefficients, but also to avoid the generation of redundant coefficients, and on the other hand On the one hand, the network is divided into clusters, and the node data is processed in units of clusters. This can not only improve the efficiency of the base station in processing the data of each node, but also enhance the ability of the base station to quickly locate outliers or abnormal nodes, thereby effectively improving the overall wireless transmission efficiency. The comprehensive monitoring ability of the sensor network system to the monitoring area.

发明内容 Contents of the invention

本发明的目的是针对现有技术的不足，提供了一种可用于环境监测的无线传感器网络(WSNs)数据压缩方法，使得由数据精度和节点平均能耗构成的综合指标最优化。The purpose of the present invention is to provide a wireless sensor network (WSNs) data compression method that can be used for environmental monitoring, so as to optimize the comprehensive index composed of data accuracy and node average energy consumption.

为了达到上述目的，本发明的技术方案是这样实现的：In order to achieve the above object, the technical solution of the present invention is achieved in that:

步骤(1).由监测区节点向基站传输各自监测值，所有节点每传输完一组监测值，基站采用联合信息准则(CIC)进行最优阶判定，若所得阶数不是最优，且节点所传数据组数没有超过N/3，节点继续传输原始监测值；否则基站由节点所传输的数值获得初始的预测系数矩阵Φ_m，并在基站处建立结构树，其中N表示一个算法周期包含的节点传输数据组数，所述的获得初始预测系数矩阵方法和建立结构树方法均为成熟技术；Step (1). The nodes in the monitoring area transmit their monitoring values to the base station. After all nodes transmit a set of monitoring values, the base station uses the joint information criterion (CIC) to determine the optimal order. If the obtained order is not optimal, and the node If the number of transmitted data sets does not exceed N/3, the node continues to transmit the original monitoring value; otherwise, the base station obtains the initial prediction coefficient matrix Φ _m from the value transmitted by the node, and builds a structure tree at the base station, where N represents an algorithm cycle containing The number of node transmission data groups, the method of obtaining the initial prediction coefficient matrix and the method of establishing the structure tree are mature technologies;

步骤(2).基站根据预测系数矩阵Φ_m，将监控区节点划分为若干簇，以簇为单位，基站根据节点监测值产生压缩指令，节点根据压缩指令产生所要传输的二进制位数i；基站轮流指定簇内节点担当簇头，簇头负责传输自身未经压缩的监测值及簇内其它节点的压缩编码至基站，同时簇头接收来自基站的指令；Step (2). The base station divides the nodes in the monitoring area into several clusters according to the prediction coefficient matrix Φ _m , and takes the cluster as the unit. The base station generates a compression command according to the node monitoring value, and the node generates the binary digit i to be transmitted according to the compression command; the base station Designate the nodes in the cluster in turn to act as the cluster head, and the cluster head is responsible for transmitting its own uncompressed monitoring values and the compressed codes of other nodes in the cluster to the base station, while the cluster head receives instructions from the base station;

步骤(3).簇内各节点经簇头依次获得基站压缩指令，各节点将其模数转换后的二进制值对2ⁱ取模，得到i位二进制压缩码，将该压缩码经簇头传至基站，同时簇头向基站传输其未经压缩的监测值；Step (3). Each node in the cluster obtains the base station compression instruction sequentially through the cluster head, and each node takes the modulus of the binary value after its modulus conversion to 2 ⁱ to obtain an i-bit binary compressed code, and transmits the compressed code through the cluster head to the base station, while the cluster head transmits its uncompressed monitoring value to the base station;

步骤(4).基站以簇为单位由步骤(1)中计算的预测系数矩阵Φ_m及簇头所传原始监测值，可得到簇内各节点估计值

(

对应第m个节点第r组监测值的估计值)，再由该节点的i位压缩码可以在结构树中定位出一个子序列，由于子序列间隔距离2^i-1Δ以概率P大于均方误差N_r，m，故在该子序列中离

最近的序列值即为节点最终估计值

同时

又成为估计同一时刻下一节点的已知条件，即在更新Φ_m时要考虑新估计出的数值；Step (4). The base station uses the cluster as a unit to obtain the estimated value of each node in the cluster from the prediction coefficient matrix Φ _m calculated in step (1) and the original monitoring value transmitted by the cluster head

(

Corresponding to the estimated value of the r-th group monitoring value of the m-th node), and then a sub-sequence can be located in the structure tree by the i-bit compressed code of the node, since the sub-sequence interval distance 2 ^i-1 Δ is greater than the average square error N _{r, m} , so in this subsequence

The most recent sequence value is the node's final estimate

at the same time

It becomes a known condition for estimating the next node at the same moment, that is, the newly estimated value should be considered when updating Φ _m ;

步骤(5).当获得各簇所有节点的最终估计值后，由基站计算并发布下一时刻各簇内节点需要的压缩指令，当节点监测到的数值组数达到N，则跳转至步骤(1)，否则跳转至步骤(2)。Step (5). After obtaining the final estimated value of all nodes in each cluster, the base station calculates and issues the compression commands required by the nodes in each cluster at the next moment. When the number of value groups monitored by the node reaches N, jump to step (1), otherwise jump to step (2).

本发明提出了基于最优阶估计与分布式分簇的传感器网络数据压缩方法。由于节点能量主要消耗在传输原始监测信息上，通过引入最优阶估计，减少了节点向基站传输的原始数据的组数，从而降低了节点平均能耗；将计算集中于时间域和空间域上与待估计节点更加接近的系列节点值上，剔除了与待估计节点相关性微弱的边远节点，从而有效降低了算法所处理数据的维度，提高了算法运算速度及节点数据恢复精度；若监控区网络规模较大，基站对区域节点分簇处理后，如果区域出现异常，基站能够迅速定位异常区所在的簇，进而与监测到异常的节点直接建立连接，与全区域逐个搜寻相比，可有效降低系统时延，提高系统监控实时性。采用本发明方法，在不同的网络规模下，基于最优阶估计与分布式分簇的传感器网络数据压缩方法在数据精度和节点平均能耗构成的综合指标上优于直接的分布式结构树压缩算法。The invention proposes a sensor network data compression method based on optimal order estimation and distributed clustering. Since the energy of nodes is mainly consumed in the transmission of original monitoring information, by introducing optimal order estimation, the number of groups of raw data transmitted by nodes to the base station is reduced, thereby reducing the average energy consumption of nodes; the calculation is concentrated on the time domain and the space domain On the series of node values that are closer to the nodes to be estimated, outlying nodes that have a weak correlation with the nodes to be estimated are eliminated, thereby effectively reducing the dimension of the data processed by the algorithm, improving the algorithm operation speed and the accuracy of node data recovery; if the monitoring area The network scale is large. After the base station clusters the regional nodes, if there is an abnormality in the region, the base station can quickly locate the cluster where the abnormal area is located, and then directly establish a connection with the node that detected the abnormality. Compared with searching the entire region one by one, it can effectively Reduce system delay and improve real-time performance of system monitoring. Using the method of the present invention, under different network scales, the sensor network data compression method based on optimal order estimation and distributed clustering is superior to direct distributed structure tree compression in terms of data accuracy and comprehensive indicators of node average energy consumption. algorithm.

附图说明 Description of drawings

图1是分布式编码方法框图；Fig. 1 is a block diagram of a distributed coding method;

图2是解码结构树；Figure 2 is a decoding structure tree;

图3是本发明方法流程图。Fig. 3 is a flow chart of the method of the present invention.

具体实施方式 Detailed ways

本发明的核心思想是：一方面引入最优阶估计，从而界定系统要传输的原始数据组数，既要获得有效的相关性系数，又要避免冗余系数的产生，另一方面对网络进行分簇处理，以簇为单位处理节点数据，这样不仅可以提高基站处理各节点数据效率，还可以增强基站迅速定位产生异值或出现异常节点的能力，从而有效提高整个无线传感网络系统对监控区的监控能力。The core idea of the present invention is: on the one hand, the optimal order estimation is introduced to define the number of original data groups to be transmitted by the system. It is necessary to obtain an effective correlation coefficient and avoid the generation of redundant coefficients. On the other hand, the network Clustering processing, processing node data in units of clusters, which can not only improve the efficiency of the base station in processing the data of each node, but also enhance the ability of the base station to quickly locate outliers or abnormal nodes, thereby effectively improving the monitoring of the entire wireless sensor network system. District monitoring capabilities.

下面结合附图对本发明做进一步的详细描述。The present invention will be described in further detail below in conjunction with the accompanying drawings.

如图1所示，分布式编码即监控区有两个或两个以上节点同时感知原始数据时，在节点间采用不对称的编码方式，建立无线传感器网络分布式压缩结构，当节点采集的数据序列为高斯相关独立同分布离散序列时，实验证明，节点B在压缩自身数据时，无论其是否知道当前时刻A，B节点数据的相关性系数，采用上述分布式压缩模式，均能获得等效的压缩性能。将框图由两个节点扩展到n各节点，根据自身之前采集数据压缩当前数据所得的压缩码，由基站计算并确定各节点数据之间相关性，并向各节点发送各自的压缩码比特位数，最后还原出各节点原始监测值。As shown in Figure 1, distributed coding means that when two or more nodes in the monitoring area perceive the original data at the same time, an asymmetric coding method is used between the nodes to establish a distributed compression structure of the wireless sensor network. When the data collected by the nodes When the sequence is a Gaussian correlated independent and identically distributed discrete sequence, the experiment proves that when node B compresses its own data, regardless of whether it knows the correlation coefficient of node A and node B data at the current moment, using the above distributed compression mode can obtain equivalent compression performance. Extend the block diagram from two nodes to n nodes, and compress the compression code obtained by compressing the current data according to the data collected before, and calculate and determine the correlation between the data of each node by the base station, and send the number of bits of the compression code to each node , and finally restore the original monitoring value of each node.

基于上述分布式编码基本原理，分布式结构树压缩算法首先由所有节点向基站传输N/3组原始值，N的大小可根据具体应用确定，基站根据这些原始值建立一个结构树，并计算出预测系数矩阵Φ_m，基站还需计算出与各节点对应的压缩指令，即下一采集时刻各节点需将各自监测值(经A/D后的)压缩后的位数，与此同时，基站要指定一个节点传输其未压缩的监测值，为使监控区各节点能量被均衡消耗，该节点可由基站轮流指定，节点按各自收到的指令，对原始监测值进行2ⁱ取模运算，从而将各自采集数据压缩至i位，基站处接收到各节点i位编码后，可以在之前建立的结构树中定位出一个较小数值序列，结合指定节点所传原始监测值及之前计算的预测系数矩阵Φ_m，恢复出该i位压缩码对应的原始监测值，当前时刻所有节点值恢复完毕，基站将更新预测系数矩阵及各节点下一时刻编码位数，类此循环往复，当前时刻各节点的值将不断被预测估计出来。有必要对以上算法涉及到的结构树，预测系数及i值的确定作进一步的说明，结构树首先是以前述N/3组原始数据的均值为起点，以Δ为间隔向两端扩展，Δ大小决定算法精度，扩展范围由具体应用决定，对上述以Δ为间隔的扩展序列进行奇偶序列分裂，可得到间距为2Δ的两组子序列，以同样的方式可再对子序列进行分裂，分裂次数由i的位数决定，i次分裂后，各子序列相邻节点间距为2ⁱΔ，第i层每个子序列对应唯一的i位编码，根据以上表述即可建立如图2所示解码结构树，当基站按获得某节点发送的i位编码，可以定位出对应的子序列，再由预测系数计算出一个估计值，通过在定位的子序列中寻找与估计值最接近的序列值，从而恢复出该节点监测值。Based on the above-mentioned basic principle of distributed coding, the distributed structure tree compression algorithm first transmits N/3 sets of original values from all nodes to the base station. The size of N can be determined according to the specific application. The base station builds a structure tree based on these original values and calculates Prediction coefficient matrix Φ _m , the base station also needs to calculate the compression command corresponding to each node, that is, each node needs to compress the number of digits of the respective monitoring values (after A/D) at the next collection time, at the same time, the base station To designate a node to transmit its uncompressed monitoring value, in order to make the energy of each node in the monitoring area be consumed in a balanced manner, the node can be designated by the base station in turn, and the nodes perform 2 ⁱ modulo calculations on the original monitoring value according to the instructions received by each node, so that Compress the collected data to i bits, and after receiving the i-bit codes of each node at the base station, a smaller numerical sequence can be located in the previously established structure tree, combined with the original monitoring value transmitted by the designated node and the previously calculated prediction coefficient The matrix Φ _m restores the original monitoring value corresponding to the i-bit compressed code. After all the node values are restored at the current moment, the base station will update the prediction coefficient matrix and the number of coded digits of each node at the next moment, and so on. The value of will be continuously estimated by the forecast. It is necessary to further explain the structure tree involved in the above algorithm, the determination of the prediction coefficient and the value of i. The structure tree first starts from the mean value of the aforementioned N/3 groups of original data, and expands to both ends at intervals of Δ, Δ The size determines the accuracy of the algorithm, and the extension range is determined by the specific application. The above-mentioned extended sequence with Δ as the interval is split into the parity sequence, and two sets of subsequences with a distance of 2Δ can be obtained. In the same way, the subsequence can be split again. The number of times is determined by the number of bits i. After the i split, the distance between adjacent nodes of each subsequence is 2 ⁱ Δ, and each subsequence in the i-th layer corresponds to a unique i-bit code. According to the above expression, the decoding shown in Figure 2 can be established Structure tree, when the base station obtains the i-bit code sent by a certain node, it can locate the corresponding subsequence, and then calculate an estimated value by the prediction coefficient, and find the sequence value closest to the estimated value in the located subsequence, Thus recovering the monitoring value of the node.

本方法在上述二维空间实现无线传感器网络数据压缩遵循以下的过程，见图3：This method realizes wireless sensor network data compression in the above-mentioned two-dimensional space and follows the following process, as shown in Figure 3:

步骤(1).由监测区节点向基站传输各自监测值，所有节点每传输完一组监测值，基站采用联合信息准则(CIC)进行最优阶判定，若所得阶数不是最优，且节点所传数据组数没有超过N/3，节点继续传输原始监测值；否则基站由节点所传输的数值获得初始的预测系数矩阵Φ_m，并在基站处建立结构树，其中N表示一个算法周期包含的节点传输数据组数，获得初始预测系数方法和建立结构树方法均为成熟技术；Step (1). The nodes in the monitoring area transmit their monitoring values to the base station. After all nodes transmit a set of monitoring values, the base station uses the joint information criterion (CIC) to determine the optimal order. If the obtained order is not optimal, and the node If the number of transmitted data sets does not exceed N/3, the node continues to transmit the original monitoring value; otherwise, the base station obtains the initial prediction coefficient matrix Φ _m from the value transmitted by the node, and builds a structure tree at the base station, where N represents an algorithm cycle containing The number of data sets transmitted by the nodes, the method of obtaining the initial prediction coefficient and the method of establishing the structure tree are all mature technologies;

步骤(2).基站根据预测系数矩阵Φ_m，将监控区节点划分为若干簇，以簇为单位，基站根据节点监测值产生压缩指令，节点根据压缩指令产生所要传输的二进制位数i；基站轮流指定簇内节点担当簇头，簇头负责传输自身未经压缩的监测值及簇内其它节点的压缩编码至基站，同时簇头接收来自基站的指令，其中i值根据公式

求得，其中p是概率，Δ是图2结构树中序列间距，

是方差；Step (2). The base station divides the nodes in the monitoring area into several clusters according to the prediction coefficient matrix Φ _m , and takes the cluster as the unit. The base station generates a compression command according to the node monitoring value, and the node generates the binary digit i to be transmitted according to the compression command; the base station Designate the nodes in the cluster in turn to act as cluster heads. The cluster head is responsible for transmitting its own uncompressed monitoring values and the compressed codes of other nodes in the cluster to the base station. At the same time, the cluster head receives instructions from the base station, where the value of i is according to the formula

Obtained, where p is the probability, Δ is the sequence distance in the structure tree in Figure 2,

is the variance;

步骤(3).簇内各节点经簇头依次获得基站压缩指令，各节点将其模数转换后的二进制值对2ⁱ取模，得到i位二进制压缩码，将该压缩码经簇头传至基站，同时，簇头向基站传输其未经压缩的监测值；Step (3). Each node in the cluster obtains the base station compression instruction sequentially through the cluster head, and each node takes the modulus of the binary value after its modulus conversion to 2 ⁱ to obtain an i-bit binary compressed code, and transmits the compressed code through the cluster head to the base station, and at the same time, the cluster head transmits its uncompressed monitoring value to the base station;

步骤(4).基站以簇为单位由步骤(1)中计算的预测系数矩阵Φ_m及簇头所传原始监测值，可得到簇内各节点估计值(对应第m个节点第r组监测值的估计值)，再由该节点的i位压缩码可以在结构树中定位出一个子序列，由于子序列间隔距离2^i-1Δ以概率P大于均方误差N_r，m，故在该子序列中离

最近的序列值即为节点最终估计值

，同时，

又成为估计同一时刻下一节点的已知条件，即在更新Φ_m时要考虑新估计出的数值；Step (4). The base station uses the cluster as a unit to obtain the estimated value of each node in the cluster from the prediction coefficient matrix Φ _m calculated in step (1) and the original monitoring value transmitted by the cluster head (corresponding to the estimated value of the r-th group monitoring value of the m-th node), and then a sub-sequence can be located in the structure tree by the i-bit compressed code of the node. Since the sub-sequence interval distance 2 ^i-1 Δ is greater than mean square error N _{r, m} , so in this subsequence

The most recent sequence value is the node's final estimate

,at the same time,

总之，本发明提出的是基于最优阶估计与分布式分簇的传感器网络数据压缩方法：网络节点在监控区随机分布，经最优阶估计后，本方法能够将具有空间相关性的节点划分在同簇，通过在基站建立预测系数矩阵及解码结构树，实现对节点监控值编码的恢复与还原，节点仅仅在预测系数初始化过程中传输监测值，然后仅需作简单取模运算实现监控值编码并传输数据量相对较小的编码，节点平均能耗能够被大幅度降低，进而有效提高传感器网络综合监控性能。应当说明的是，基站处利用最优阶估计的方式不同(如应用在空间相关性阶估计或时间相关性阶估计)等方法都是不脱离本发明技术方案的精神和范围的。In short, the present invention proposes a sensor network data compression method based on optimal order estimation and distributed clustering: network nodes are randomly distributed in the monitoring area, after optimal order estimation, this method can divide nodes with spatial correlation In the same cluster, by establishing a prediction coefficient matrix and a decoding structure tree at the base station, the recovery and restoration of the node monitoring value encoding is realized. The node only transmits the monitoring value during the initialization process of the prediction coefficient, and then only needs to perform a simple modulo operation to realize the monitoring value. By encoding and transmitting codes with a relatively small amount of data, the average energy consumption of nodes can be greatly reduced, thereby effectively improving the comprehensive monitoring performance of sensor networks. It should be noted that methods such as different methods of using optimal order estimation at the base station (such as applying to spatial correlation order estimation or time correlation order estimation) do not depart from the spirit and scope of the technical solution of the present invention.

Claims

1. The sensor network data compression method based on optimal order estimation and distributed clustering, it is characterized in that the method comprises the steps:

Step (1). The nodes in the monitoring area transmit their monitoring values to the base station. After all nodes transmit a set of monitoring values, the base station uses the joint information criterion to determine the optimal order. If the obtained order is not optimal and the data set transmitted by the node If the number does not exceed N/3, the node continues to transmit the original monitoring value; if the obtained order is optimal, the base station obtains the initial prediction coefficient matrix Φ _m from the value transmitted by the node, and builds a structure tree at the base station, where N Indicates the number of node transmission data groups included in an algorithm cycle;

Step (2). The base station divides the nodes in the monitoring area into several clusters according to the prediction coefficient matrix Φ _m ; taking the cluster as a unit, the base station generates a compression instruction according to the node monitoring value, and the node generates the binary digit i to be transmitted according to the compression instruction; the base station Designate the nodes in the cluster in turn to act as the cluster head, and the cluster head is responsible for transmitting its own uncompressed monitoring values and the compressed codes of other nodes in the cluster to the base station, while the cluster head receives instructions from the base station;

Step (3). Each node in the cluster obtains the base station compression instruction sequentially through the cluster head, and each node takes the modulus of the binary value after its modulus conversion to 2 ⁱ to obtain an i-bit binary compressed code, and transmits the compressed code through the cluster head to the base station, while the cluster head transmits its uncompressed monitoring value to the base station;

Step (4). The base station takes the cluster as a unit, and obtains the estimated value of each node in the cluster from the prediction coefficient matrix Φ _m calculated in step (1) and the original monitoring value transmitted by the cluster head

Then a subsequence is located in the structure tree by the i-bit compression code of the node, and in the subsequence

The most recent sequence value is the node's final estimate

at the same time

It becomes a known condition for estimating the next node at the same moment, that is, the newly estimated value is considered when updating Φ _m ;

Step (5). After obtaining the final estimated value of all nodes in each cluster, the base station calculates and issues the compression commands required by the nodes in each cluster at the next moment. If the number of value groups detected by the node reaches N, then jump to step (1); if the number of value groups monitored by the node does not reach N, then jump to step (2);

Where r represents the number of groups, m represents the node number, Φ _m represents the prediction coefficient matrix of m nodes,

Indicates the final estimated value of the rth group of the mth node,

Indicates the estimated value of the r-th group of monitoring values of the m-th node.