Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
CN115409155A - Information cascade prediction system and method based on Transformer enhanced Hooke process - Google Patents
[go: Go Back, main page]

CN115409155A - Information cascade prediction system and method based on Transformer enhanced Hooke process - Google Patents

Information cascade prediction system and method based on Transformer enhanced Hooke process Download PDF

Info

Publication number
CN115409155A
CN115409155A CN202210983826.9A CN202210983826A CN115409155A CN 115409155 A CN115409155 A CN 115409155A CN 202210983826 A CN202210983826 A CN 202210983826A CN 115409155 A CN115409155 A CN 115409155A
Authority
CN
China
Prior art keywords
node
embedding
global
main path
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210983826.9A
Other languages
Chinese (zh)
Other versions
CN115409155B (en
Inventor
周帆
余柳
钟婷
徐增
匡平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210983826.9A priority Critical patent/CN115409155B/en
Publication of CN115409155A publication Critical patent/CN115409155A/en
Application granted granted Critical
Publication of CN115409155B publication Critical patent/CN115409155B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/40Business processes related to social networking or social networking services
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种基于Transformer增强霍克斯过程的信息级联预测系统及方法,包括用户嵌入模块、全局依赖模块、局部依赖模块、强度函数获取模块和流行度预测模块,首先获取用户position‑wise嵌入和时间编码作为用户嵌入,然后以拓扑结构的视角,引入路径感知假设,并从全局嵌入和局部模式两个角度设计两层注意力层,参数化霍克斯过程的强度函数,并结合霍克斯过程、全局嵌入和局部模式,学习信息级联扩散过程耦合的时间和拓扑随机特性,以进行流行度预测;本发明扩展了传统的霍克斯过程,并有效地从连续时间域中获取知识,提升流行度预测准确性。

Figure 202210983826

The invention discloses an information cascading prediction system and method based on Transformer-enhanced Hawkes process, including a user embedding module, a global dependency module, a local dependency module, an intensity function acquisition module and a popularity prediction module. Firstly, the user position- Wise embedding and time coding are used as user embedding, and then from the perspective of topology, the path-aware assumption is introduced, and two layers of attention are designed from the perspectives of global embedding and local mode, parameterizing the strength function of the Hawkes process, and combining Hawkes process, global embedding and local patterns, learning the temporal and topological stochastic properties of information cascade diffusion process coupling for popularity prediction; the present invention extends the traditional Hawkes process and effectively learns from the continuous time domain Gain knowledge to improve the accuracy of popularity predictions.

Figure 202210983826

Description

Information cascade prediction system and method based on Transformer enhanced Hooke process
Technical Field
The invention belongs to the technical field of Information processing, relates to Information diffusion (Information diffusion), information Cascade (Information Cascade) and Popularity Prediction (Popularity Prediction) based on Deep Learning (Deep Learning), and particularly relates to a method and a system (Hawkesformer) for introducing Transformer into a Hooke Process (Hawkes Process) to perform coupling modeling on time characteristics and topological structure of Cascade and finally realize Popularity Prediction (Cascade Prediction).
Background
Content sharing through social media platforms such as Twitter, facebook and microblog has become a main channel for expression opinions and reaction opinions of individuals on various topics. This trend has produced a large amount of data describing the manner in which information is disseminated, and has brought unprecedented convenience for generating, communicating, and disseminating information items. The initial distribution of information (e.g., tweet) and subsequent sharing and forwarding (e.g., retweet) forms a concatenation of information, representing what is commonly referred to as the information dissemination process. Similar phenomena are found in other (non-social media) settings, paper quotations, blog spaces, and email forwarding. The large number of user sharing behaviors promotes the rapid and massive spread of information concatenation. A typical task of information concatenation is to predict the scale of a potentially affected user after a certain period of time (tweet, microblog, etc.), i.e., popularity prediction (size of information concatenation), which brings enormous economic and social value in many downstream applications. For example, knowing which type of advertisement has maximized influence may make ad placement more accurate; predicting rumors' potential impact users so that administrators can intervene early to avoid serious consequences, etc.
The existing methods are mainly divided into three categories: (i) Feature-based models that mainly explore feature sets extracted from structures, time series, user content and profiles; however, they rely on a large number of hand-made feature projects and cannot be generalized from one field to another; (ii) Probabilistic generative models, aimed at modeling events by classical examples of point-in-time processes (TPPs), including poisson and hokes processes; however, the existing method makes a strong assumption on a cascading diffusion mechanism, and has limited learning capacity on large-scale cascading data; (iii) The deep learning model focuses more on static time cascade diffusion in a specific snapshot in a discrete time domain, has little focus on the whole evolution process and fully utilizes all cascade dynamics, and mainly uses a simple event sequence model (such as RNN) to model cascade events or is assisted by GNN to separately model time characteristics and topological characteristics; but the two are coupled in nature, and the mode of separated modeling loses cross-domain information and reduces the expression and prediction capability of the model.
Disclosure of Invention
Aiming at the problems of limited learning capacity, cross-domain information loss and the like of the current information cascade prediction method, the invention aims to provide a novel modeling mode for the cascade diffusion Process, generalize the cascade diffusion Process into a Directed Acyclic Graph (DAG) which is continuously diffused in a continuous time domain, introduce a Transformer (attention mechanism) into a Hawkes Process, introduce an attention mechanism into the Hawkes Process to model time characteristics and topological characteristics in the continuous time domain in a coupling manner, realize effective modeling of the cascade diffusion Process and improve the accuracy of popularity prediction.
The general idea of the invention is as follows: two levels of attention modules are connected together to parameterize the intensity function of the hokes process. The strength function ensures that the coupling time and topological dependence of the cascade is modeled in the continuous time domain. Specifically, at a first level, a global (global) dependent module is designed to dynamically capture long-term diffusion processes of events that have occurred in the past in the cascade, and main/non-main path assumptions are proposed to adaptively integrate the diffusion processes onto the underlying DAG. This allows each node participating in the cascade anywhere to update its current hidden state. At the second level, a local module is designed to capture the short-term evolution rate of the information cascade by encoding local patterns in a fixed time slice window. Through the design of two levels, the representation of the nodes on the global and local is taken, and finally, a full connection layer is fed for popularity prediction.
Before a particular flow is given, some basic terms and definitions are given. Given a concatenation of information C k And it is at observation time t o Forwarding history within a network
Figure BDA0003801224340000021
Wherein t is j ∈(0,t o ]L is within the observation time window (t) 0 ,t 0 +t o ) Number of forwarding of t 0 Indicating the start time. Each triplet (t) j ,v j ,u j ) Is represented at t j Time of day, user u j Forwards user v j . For example, the forwarding history of FIG. 1 is:
Figure BDA0003801224340000022
(t 4 ,u 1 ,u 4 ),(t 5 ,u 3 ,u 5 ) For simplicity, the following is omitted
Figure BDA0003801224340000023
In addition, we use
Figure BDA0003801224340000024
Figure BDA0003801224340000025
Representing the set of timestamps before the current time.
Based on the above inventive concept, the information cascade prediction system based on the Transformer enhanced hokes process provided by the invention comprises:
the User embedding module (Initial User indexes) is used for acquiring User position-wise embedding and time coding and taking the User position-wise embedding and the time coding as User embedding;
a Global Dependency Module (Global Dependency Module) for distinguishing all predecessor nodes as main path nodes and non-main path nodes for each user node in user embedding, and obtaining attention probability matrixes of the main path nodes and the non-main path nodes, further obtaining Global Dependency of each node, and obtaining Global embedding of all nodes through full connection;
a Local Dependency Module (Local Dependency Module) for obtaining a Local mode of each node according to a set time slice window, and then performing pooling operation on the Local mode of each node to obtain a pooled representation of each node; finally, generating a hidden vector of each node local mode based on a mask attention mechanism;
the intensity function acquisition module is used for parameterizing an intensity function of the Hox process according to the global embedding vector and the local mode hiding vector;
and the popularity prediction module is used for generating a popularity prediction result by combining the integral of the Hox process and the full-connection processing result of the global embedding and local mode hiding vectors.
The user embedding module, the user embedding X, is composed of position-wise embedding and time coding and is used as the input of a subsequent module (a global dependency module).
The user embedded module comprises a Position-wise embedded acquisition submodule and a time code acquisition submodule.
Position-wise Embedded acquisition submodule, including shared matrix
Figure BDA0003801224340000031
Where D is the user-embedded dimension and L is the maximum number of hops. The jth column of U represents the jth forwarding user, and each cascade of nodes in the same order on the time axis share an initial embedding. The shared matrix U is first randomly initialized and then subsequent context embeddings related to the underlying cascade are dynamically updated by subsequent global dependency modules. The shared matrix U greatly reduces the computational overhead, making the system more efficient. In a cascade, for the j-th forwarded user u j Let v j Representing its one-hot index vector on the shared matrix U, then user U j Position-wise embedding of (A) is determined by (U) v j The index is obtained, wherein
Figure BDA0003801224340000032
Representing the set of one-hot indexes embedded by all users. Thus, the Position-wise embedding of the user, which is derived by the Position-wise embedding derivation submodule, is UV based on the shared matrix U.
2. The time code acquisition submodule (Temporal Encoding), a key component of the present invention, is a two-level attention module, and since the attention mechanism does not consider the sequence dependency, the time code acquisition submodule characterizes the sequence position by using the time code z, which is defined as follows:
Figure BDA0003801224340000033
the temporal coding is a variant from the original transform absolute position coding. Wherein
Figure BDA0003801224340000034
Is time-coded, D is the dimension of the time-coding, [ z (t) j )] i Represents the value of the ith dimension of the code, i ∈ [1, D ]]。
Figure BDA0003801224340000035
Figure BDA0003801224340000036
Is a collection of temporal codes.
Thus, the forwarding sequence
Figure BDA0003801224340000037
The initial user embedding of (a) can eventually be written as: x = UV + Z. The initial user embedding X consists of a learnable shared position-wise embedding UV together with a fixed time coding Z.
The global dependency module is referred to as a global module for short; the global dependency of the information cascade can be seen as an indication that a certain user may promote or suppress another user during long-term diffusion.
The global dependency module includes:
the path perception attention layer is used for distinguishing all the precursor nodes into a main path node and a non-main path node for each user node in user embedding, acquiring long-term dependence attention scores of the main path node and the non-main path node and constructing an attention probability matrix of the main path and the non-main path; acquiring global dependence of each node according to the attention probability matrix of the main path and the non-main path of each node;
and the first output network is used for carrying out full connection processing on the global dependence of each node to obtain global embedding of all nodes.
After the user embedding is obtained, a path perception attention layer is designed in the global module to parameterize the intensity function of the Hawkesformer, the long-term diffusion dependence between the input cascade nodes is captured, and the time topological dependence of the coupling of the main path node and the non-main path node is extracted by using the user complete diffusion dependence extraction sub-module.
The theory of the hokes process states that: the final popularity of information concatenation is related to the past participation of the user, which means that the current node affects not only its immediate (immedate) forwarder, but also non-immediate (non-immedate) forwarders by transitivity. In order to indicate the influence of each node on the future popularity of the cascade from a topological point of view, a path-aware assumption, i.e. a primary path/non-primary path assumption, is therefore proposed in the global module: consider a node on the main path to be current node u j The contribution of (b) is larger, the node on the non-main path is next to u j The arrival contribution of the node is small but still has an impact on the final popularity, which also follows the assumptions of the hokes process. For the current node u j All predecessor nodes
Figure BDA0003801224340000041
Figure BDA0003801224340000042
Are divided into two categories: (1) Nodes on the main path, denoted
Figure BDA0003801224340000043
It includes a front node u j Backtracking to root node u 0 All previous nodes on the entire forwarding path; (2) Nodes on the non-primary path, i.e. all nodes except the primary path node, are marked as
Figure BDA0003801224340000044
An example is given in fig. 2, where for the current node u 4 The nodes on the primary path are represented by (1) and (2), and the nodes on the non-primary path are represented by (3).
Based on the above explanation, the implementation steps of the global dependency module are as follows:
given a current node u j To obtain a node between it and a main path
Figure BDA0003801224340000045
Long term dependent attention score of (c):
Figure BDA0003801224340000046
wherein a, b and
Figure BDA0003801224340000051
respectively representing user embedded matrix X main path Q (Query) matrix
Figure BDA0003801224340000052
Non-main path Q (Query) matrix
Figure BDA0003801224340000053
And K (Key) matrix
Figure BDA0003801224340000054
The transformed users embed vectors, and the three transformation matrices are used for distinguishing roles of the users in long-term dependence. Wherein,<·,·>representing the inner product function. Similarly, current node u j A node in the non-main path
Figure BDA0003801224340000055
Long term dependence attention scores of (a) are:
Figure BDA0003801224340000056
then, attention weighting and calculation are performed through the following formula to obtain the user u j Complete diffusion dependence e j
Figure BDA0003801224340000057
Finally, the context embedding of each node is obtained according to the attention scores of the primary path and the non-primary path of each node: e = { E = { E) 0 ,e 1 ,…,e j ,…,e L Each line in E represents the global dependency of a certain node in the cascade;
Figure BDA0003801224340000058
embedding matrix X via V (Value) matrix on behalf of user
Figure BDA0003801224340000059
The converted user embeds the vector.
Above for the current single user u j Is calculated. In practical operation, the above attention is relied on using matrix parallel to speed up the operation. The matrix calculation flow of global dependency can refer to fig. 3.
Firstly, based on the path perception attention layer, the attention scores of the current node on the main path and the non-main path are calculated through the following formulas:
Figure BDA00038012243400000519
Figure BDA00038012243400000520
wherein M is P
Figure BDA00038012243400000510
One-hot matrix for respectively reserving attention scores of nodes on the main path and the non-main path if
Figure BDA00038012243400000511
Then element
Figure BDA00038012243400000512
Accordingly, if
Figure BDA00038012243400000513
Finally, the attention probability matrix for the primary and non-primary paths can be derived as:
Figure BDA00038012243400000514
where D is the dimension of the input embedding. Mask matrix
Figure BDA00038012243400000515
To prevent the system from peeping into future data if i<j,M i,j =0, otherwise M i,j And = - ∞. Matrix M forces the softmax function to be only at u j Distributing valid long-term dependent attention on previous forwarders to subsequent users (i.e., later than t) j Active user) is assigned 0. Each column vector in matrix A represents u j To its precursor node
Figure BDA00038012243400000516
Including the main path
Figure BDA00038012243400000517
And a non-main path
Figure BDA00038012243400000518
A node of (c).
Then, a matrix form of the global dependency of each node is obtained according to the following formula:
E=(W V X)A
in order to learn more semantic information, the path perception attention layer can be designed as a module based on a multi-head attention mechanism, context embedding and mapping are carried out on different subspaces, the subspaces can use the matrix multiplication parallel operation given in the foregoing, and then a plurality of heads E are calculated 1 ,E 2 ,…,E H Then connecting them into E = W O Concat(E 1 ,E 2 ,…,E H ) Wherein
Figure BDA0003801224340000061
Is an aggregation matrix. Finally, we feed E into the first output network (here the fully connected neural network FCNN) resulting in a global embedding of all cascades:
Figure BDA0003801224340000062
h(t j )=H(:,j)
wherein,
Figure BDA0003801224340000063
is a parameter of the neural network; h (t) j ) Is that
Figure BDA0003801224340000064
J-th column in (1) represents user u j Is indicative of long term dependence.
The local dependency module may be referred to as a local module for short; the rate of increase of the forwarding number in the information concatenation is not stable, and the final popularity is greatly influenced by short-term outbreaks in the propagation process. To better embed the evolution rate of the information cascade, pooled attention is used here to capture local patterns. And performing a pooling operation on the local mode of each current node based on the learned user global embedded representation in the global dependency module.
The local dependency module includes:
the local mode acquisition submodule is used for intercepting a part positioned in front of each node from the global dependence according to a set time slice window to obtain a local mode of each node;
the pooling submodule is used for performing pooling operation on the local mode of each node to obtain pooling representation of each node;
and the mask attention layer is used for dynamically aggregating the local mode pooling representation of each node and the nodes before the node, and generating the hidden vector of the local mode of each node.
Each current node has a local mode, and a fixed time slice window t is set s Utilizing a local mode acquisition submodule to intercept a part positioned in front of each node from the global dependency to obtain a local mode of each node; thus, the local mode of the current node is determined by the current node u j And a global dependency component of some predecessor nodes, denoted as
Figure BDA0003801224340000066
Representing the set of all local patterns of a piece of information concatenation.
For each current node u j Local mode to it by pooling submodules
Figure BDA0003801224340000067
Performing a pooling operation to polymerize a time slice t s Generating a new representation of the feature in (1)
Figure BDA0003801224340000065
Therefore, all local modes
Figure BDA0003801224340000068
The total pooled representation of (a) may be formalized as:
Figure BDA0003801224340000071
each column in (b) corresponds to a pooled representation of the local pattern.
To capture the evolutionary rate associated with cascading short-term outbreaks/falls, the present invention masks by designThe code attention layer incorporates the impact of the local patterns. Dynamically aggregating current time t with a mask attention layer based on a mask attention mechanism j All previous local mode embedding (i.e. pooling representation), generating a concealment vector to summarize the evolution process of all previous modes:
Figure BDA0003801224340000072
g (-) is a linear transformation function, γ = { v (t) = 0 ),v(t 1 ),…,v(t j ),…,v(t L ) Is the set of local dependencies of the evolution rate. The similarity function f (·, ·) is specified as:
Figure BDA0003801224340000073
the intensity function obtaining module parameterizes the intensity function of the hokes process through the global module and the local module to obtain the intensity function lambda (t) of the Hawkesformer. Given a concatenation of information C k And it is at observation time t o Forwarding history within
Figure BDA0003801224340000074
Representing a time stamp t j Before but not including time t j The sequence of (a). Here, the intensity function obtaining module models the continuous dynamics of the cascade diffusion using the following conditional intensity functions:
Figure BDA0003801224340000075
wherein, theta represents the parameter of the model, and the forwarding time t is defined in the interval [ t ] j ,t j+1 ) Where f (x) = β lg (1 + exp (x/β)) is the softplus function, and the parameter β is used to constrain the strength function to be positive; the first current represents the evolution process of the continuous time domain, and alpha controls the importance degree of interpolation nodes in two timestamps; w is a 1 、w 2 The learning weights of the global dependency and the local mode at the time t are respectively, and the strength function lambda (t) represents the inferred text arrival rate at the time t. Both global and local dependencies of the Hawkesformer have expressive power and flexibility, and the design of the intensity function enables the Hawkesformer to capture coupled spatio-temporal dynamics from the continuous time domain.
The popularity prediction module is mainly used for predicting the popularity of the information. According to the Hawkes theory, the survival function at time t is:
Figure BDA0003801224340000076
the function is represented at t n ,t]Without conditional probability of an event occurring in between. Then, we can deduce the probability of observing an event at any time t':
Figure BDA0003801224340000081
finally, the final popularity is predicted through a popularity prediction module by combining the time point process and the output of two levels of attention mechanism modules (a global module and a local module) according to the following formula
Figure BDA0003801224340000082
Figure BDA0003801224340000083
The first term Λ represents the integral of the hokes process, representing the expected magnitude of popularity; t is t o Represents the observation time; t is t p Representing a predicted time; the second term represents the output result of the global embedding and local pattern hidden vector splicing result after passing through the second output network (here, the fully connected neural network FCNN).
The information cascade prediction system based on the transform enhanced Hooke process also comprises an optimization module used for optimizing parameters in the system training process.
A cascade C k Is forwarded toTime
Figure BDA0003801224340000084
The joint likelihood of (c) can be expressed as:
Figure BDA0003801224340000085
wherein
Figure BDA0003801224340000086
Represents until time t j The time set of (c).
Assume that there are N information cascade label samples for training, { C 1 ,C 2 ,…,C N And the optimization module optimizes the loss of the popularity degree and the maximization of the log-likelihood of all information cascades by minimizing Mean Square Logarithm Error (MSLE):
Figure BDA0003801224340000087
y k representing the true popularity of the kth information concatenation label exemplar,
Figure BDA0003801224340000088
and the predicted popularity corresponding to the kth information concatenation label sample is shown.
The Adam gradient descent algorithm is then used to optimize the system network parameters.
The invention also provides an information cascade prediction method based on the Transformer enhanced Hox process, which comprises the following steps:
s1, acquiring user position-wise embedding and time coding, and taking the user position-wise embedding and the time coding as user embedding;
s2, for each user node in user embedding, distinguishing all precursor nodes as a main path node and a non-main path node, respectively obtaining attention probability matrixes of the main path node and the non-main path node, further obtaining the global dependence of each node, and then obtaining the global embedding of all the nodes through full connection processing;
s3, acquiring a local mode of each node according to a set time slice window, and then performing pooling operation on the local mode of each node to obtain pooling representation of each node; then based on a mask attention mechanism, generating a local mode hidden vector of each node;
s4, parameterizing a strength function of the Hox process according to the global embedding vector and the local mode hiding vector;
and S5, integrating the Hox process and the full-connection processing result of the global embedding vector and the local mode hiding vector to generate a popularity prediction result.
The present invention thus achieves a system and method for coupled modeling and prediction of time features and topological features in the continuous time domain by introducing a two-tier attention mechanism into the hokes process. At a first level, a DAG is distinguished according to a main path and a non-main path, and the layer is used for capturing global dependency relationships between nodes, namely coupled time and topological dependency; at the second level, a local pool attention module is proposed to embed the evolution rate of the cascade to simulate bursts and sags of short-term forwarding numbers.
Compared with the prior art, the invention has the following beneficial effects:
1. according to the method, user position-wise embedding and time coding are firstly obtained to serve as user embedding, then a path perception hypothesis is introduced from the view angle of a topological structure, two attention layers are designed from the two angles of global embedding and a local mode, the intensity function of a Hox process is parameterized, and the coupled time and topological random characteristics of an information cascade diffusion process are learned by combining the Hox process, the global embedding and the local mode to predict popularity; the invention expands the traditional Hox process, effectively acquires knowledge from the continuous time domain and improves the popularity prediction accuracy.
2. The invention proposes a two-level attention architecture, which parameterizes the intensity function of the hokes process; firstly, a global module attention layer based on path perception is designed to capture a long-term diffusion dependency relationship between nodes in a dynamic diffusion topology, and then a local module is provided to capture a short-term evolution rate in information cascade.
3. The invention customizes a learnable position-wise user embedding method for the information cascade directed acyclic graph, thereby improving the parameter efficiency and reducing the calculation burden;
4. the method is extremely important for understanding the evolution process of the social network platform and explaining the popular reasons of the cascade; for example, the method and the system can predict the forwarding amount of a certain microblog in a future period of time, explain popular factors, and be used for downstream tasks such as marketing design, rumor prediction, influence maximization and the like.
Drawings
FIG. 1 is a process of information cascade diffusion presenting a Directed Acyclic Graph (DAG). Existing models model the temporal features of node occurrences and the topological features of the occurrences in the graph separately, e.g., temporal features are fed into the RNN and topological features are fed into the GNN. But the two are in a coupling mode in nature, and when the forwarding time of a certain node is determined, the topological structure in the DAG is also determined at the same time; e.g. u 2 Is dependent on u 0 Rather than its previous node u in time 1 (ii) a Thus, DAGs form coupled temporal topological dependencies, whereas separate modeling cannot capture coupled dependencies.
FIG. 2 is a hypothetical of primary and non-primary paths in a global dependency module, distinguished from a chain structure based on an RNN sequence model; this assumption can directly capture the long term dependencies on the primary path (non-primary path).
Fig. 3 is a detailed diagram of the implementation of the assumption of the primary path and the non-primary path in the global dependency module, that is, the implementation of fig. 2.
FIG. 4 is a model diagram of an information cascade prediction system based on a Transformer enhanced Hooke process, which includes an initial user embedding module, a global dependency module, a local dependency module and a prediction module.
FIG. 5 is a visual rendering of an attention probability matrix A representing thermodynamic diagrams (Heatmaps) and cascade structures corresponding to three cascades randomly selected in a Weibo dataset; the diagonal line in the thermodynamic diagram represents the attention of the current node to the current node, and each row represents the attention of the current node to all the predecessor nodes; each entry ranges from [0,1], the darker the entry color, the higher the attention, and the 1 for each row.
Interpretation of terms
Information Cascade (Information Cascade): figure 1 illustrates this process in one example: after a root node issues a piece of public content, friends and followers of the root node can see the public content and then forward the public content one by one. In this way, public content propagates through the edges of the social network and creates information cascades, a typical task of which is to predict the size of a cascade (tweet, microblog, etc.) that is potentially affected after an observation period. The theoretical basis can be found in the literature [ J.Cheng, L.Adamic, P.A.Dow, J.M.Kleinberg, and J.Leskovec.Can cassettes be predictedlin Proc.of WWW,2014 ]
Hokes process (HawkesProcess, HP): the hokes process is a mainstream Point-in-time process (TPPs). HP has three key factors: (1) The influence of the users is greater, and the influence users contribute to the newly forwarded arrival rate, which indicates that the tweets forwarded by the influence users are often forwarded more; (2) A self-excitation mechanism, wherein each forwarding can influence the arrival rate of future new forwarding; and (3) time attenuation effect. The effect of forwarding decays with increasing time. But traditional HPs simplify the arrival rate of events, model the arrival of events using, for example, exponential or power law functions, and make strong assumptions about diffusion mechanisms in cascading events, limiting their learning capabilities on large-scale cascading data.
Self-Attention Mechanism (Self-Attention Mechanism): the self-Attention mechanism is a special case of the Attention mechanism recently proposed in transform [ vaswanni, ashish, et al ] "Attention is all you needed." Advances in neural information processing systems 30 (2017) ], which has been widely applied to various NLP tasks such as machine translation, language modeling and the like, and brings about considerable improvement. It encodes the input sequence by correlating the input tokens, which consist of the matrices Q (Queries), K (Keys) and V (Values). The matrix form of the attention function is:
Figure BDA0003801224340000111
Detailed Description
The invention is further described with reference to the accompanying drawings.
Example 1
For convenience, we will use Twitter (Twitter) concatenation as an example setup in the rest of the invention, but the invention is also applicable to other types of information concatenation, such as micro blogging, academic citations, etc., and we will give examples in the experimental part.
The information cascade prediction system based on the transform enhanced hoxox process provided by this embodiment, as shown in fig. 4, includes a user embedding module, a global dependency module, a local dependency module, a strength function obtaining module, and a popularity prediction module.
(1) User embedded module
And the user embedding module is used for acquiring the user position-wise embedding and the time coding and taking the user position-wise embedding and the time coding as the user embedding.
The user embedded module comprises a Position-wise embedded acquisition submodule and a time code acquisition submodule.
Position-wise embedded acquisition submodule comprising a shared matrix
Figure BDA0003801224340000112
In a cascade, for the j-th forwarded user u j Let v j Representing its one-hot index vector on the shared matrix U, then user U j Position-wise embedding of (A) is determined by (U) v j The index is obtained, wherein
Figure BDA0003801224340000113
Representing the set of one-hot indices embedded by all users. Therefore, based on the shared matrix U, position-wise embedding the acquisition submoduleThe Position-wise embedding of the user to the UV.
The temporal code acquisition submodule characterizes the sequence position by means of a temporal code z, defined as follows:
Figure BDA0003801224340000121
the temporal coding is a variant from the original transform absolute position coding. Wherein
Figure BDA0003801224340000122
Is time coding, D is the dimension of time coding, [ z (t) j )] i Represents the value of the ith dimension of the code, i ∈ [1, D ]]。
Figure BDA0003801224340000123
Figure BDA0003801224340000124
Is a collection of temporal codes.
Thus, the user embedding can ultimately be written as: x = UV + Z. The initial user embedding X consists of a learnable shared position-wise embedding UV together with a fixed time coding Z.
The shared matrix U is initially initialized and its parameters are dynamically updated together by the designed optimization module using back-propagation to get subsequent context embedding related to the underlying cascade.
(2) Global dependency module
And the global dependency module is used for distinguishing all precursor nodes into a main path node and a non-main path node for each user node in the user embedding process, acquiring the attention probability matrix of the main path node and the non-main path node, further acquiring the global dependency of each node, and then performing full connection processing to obtain the global embedding of all the nodes.
The global dependency module includes a path-aware attention layer and a first output network.
The path perception attention layer is used for distinguishing all precursor nodes into a main path node and a non-main path node for each user node in user embedding, acquiring long-term dependence attention scores of the main path node and the non-main path node and constructing an attention probability matrix of the main path and the non-main path; and then obtaining the global dependence according to the attention probability matrix of the main path and the non-main path of each node.
As shown in fig. 2 and 3, a path-aware hypothesis, i.e., a primary path/non-primary path hypothesis, is proposed in the global module. Given a current node u j To obtain it and a certain node in the main path
Figure BDA0003801224340000125
Long term dependent attention score of (c):
Figure BDA0003801224340000126
wherein a, b and
Figure BDA0003801224340000127
respectively representing user embedded matrix X main path Q (Query) matrix
Figure BDA0003801224340000128
Non-primary path Q (Query) matrix
Figure BDA0003801224340000129
And K (Key) matrix
Figure BDA00038012243400001210
The transformed users embed vectors, and the three transformation matrices are used for distinguishing roles of the users in long-term dependence. Wherein,<·,·>represents the inner product function. Similarly, the current node u j A node in the non-main path
Figure BDA00038012243400001211
The long term dependent attention score of (a):
Figure BDA0003801224340000131
then, attention weighting and calculation are performed through the following formula to obtain the user u j Complete diffusion dependence e j
Figure BDA0003801224340000132
And finally, constructing the global dependency of all nodes according to the complete diffusion dependency of each node. The context embedding of each node is obtained according to the attention scores of the primary path and the non-primary path of each node: e = { E = 0 ,e 1 ,…,e j ,…,e L Each line in E represents the global dependency of a certain node in the cascade;
Figure BDA0003801224340000133
embedding matrix X via V (Value) matrix on behalf of user
Figure BDA0003801224340000134
The converted user embeds the vector.
The above is for the current single user u j Is calculated. In practical operation, the above attention is relied on using matrix parallel to speed up the operation.
Based on the path perception attention layer, the attention scores of the current node on the main path and the non-main path are calculated through the following formulas:
Figure BDA00038012243400001314
Figure BDA00038012243400001315
wherein M is P
Figure BDA0003801224340000135
One-hot matrices for preserving the attention scores of the nodes on the primary path and the non-primary path, respectively, if
Figure BDA0003801224340000136
Then element
Figure BDA0003801224340000137
Accordingly, if
Figure BDA0003801224340000138
Finally, the attention probability matrix for the primary and non-primary paths can be derived as:
Figure BDA0003801224340000139
where D is the dimension of the input embedding. Mask matrix
Figure BDA00038012243400001310
To prevent the system from peeping into future data if i<j,M i,j =0, otherwise M i,j And = - ∞. Matrix M forces the softmax function to be only at u j Distributing valid long-term dependent attention on previous forwarders to subsequent users (i.e., later than t) j Active user) is assigned 0. Each column vector in matrix A represents u j To its precursor node
Figure BDA00038012243400001311
Including the main path
Figure BDA00038012243400001312
And nodes on non-primary paths
Figure BDA00038012243400001313
Then, a matrix form of the global dependency of each node is obtained according to the following formula:
E=(W V X)A
in order to learn more of the semantic information,the path perception attention layer is designed into a module based on a multi-head attention mechanism, context is embedded and mapped to different subspaces, and the subspaces can use the matrix multiplication parallel operation given in the foregoing to further calculate a plurality of heads E 1 ,E 2 ,…,E H Then connecting them into E = W O Concat(E 1 ,E 2 ,…,E H ) In which
Figure BDA0003801224340000141
Is an aggregation matrix. And the first output network is used for carrying out full connection processing on the global dependence of all the nodes to obtain the global embedding of all the nodes. In this embodiment, the fully-connected neural network FCNN is used as the first output network, and the global dependencies E of all nodes are sent to the first output network to obtain global embeddings of all nodes:
Figure BDA0003801224340000142
h(t j )=H(:,j)
wherein,
Figure BDA0003801224340000143
is a parameter of the neural network; h (t) j ) Is that
Figure BDA0003801224340000144
J-th column in (1) represents user u j Is indicative of long term dependence.
(3) Local dependency module
The local dependence module is used for acquiring the local mode of each node according to the set time slice window, and then performing pooling operation on the local mode of each node to obtain pooling representation of each node; a hidden vector for each node local pattern is then generated based on the masked attention mechanism.
The local dependency module includes a local pattern acquisition submodule, a pooling submodule, and a masking attention layer.
Local mode acquisition submodule forAnd according to the set time slice window, intercepting the part positioned in front of each node from the global dependency to obtain the local mode of each node. Setting a fixed time slice window t s Utilizing a local mode acquisition submodule to intercept a part positioned in front of each node from the global dependency to obtain a local mode of each node; thus, the local mode of the current node is determined by the current node u j And a global dependency component of some predecessor nodes, denoted as
Figure BDA0003801224340000147
Representing the set of all local patterns of a piece of information concatenation.
And the pooling sub-module is used for performing pooling operation on the local mode of each node to obtain a pooled representation of each node. For each current node u j Local mode to it by pooling submodules
Figure BDA0003801224340000148
Performing a pooling operation to polymerize a time slice t s Generating a new representation form by the features in (1)
Figure BDA0003801224340000145
Therefore, all local modes
Figure BDA0003801224340000149
The total pooled representation of (a) may be formalized as:
Figure BDA0003801224340000146
each column in (a) corresponds to a pooled representation of the local pattern.
And the mask attention layer is used for dynamically aggregating the local mode pooling representation of each node and the nodes before the node, and generating the hidden vector of the local mode of each node. Dynamically aggregating current time t with a mask attention layer based on a mask attention mechanism j All previous local mode embedding (i.e. pooling representation), generating a concealment vector to summarize the evolution process of all previous modes:
Figure BDA0003801224340000151
g (-) is a linear transformation function, γ = { v (t) 01 ),v(t 1 ),…,v(tj),…,v(t L ) Is a set of local dependencies of the evolution rate. The similarity function f (·, ·) is specified as:
Figure BDA0003801224340000152
(4) Intensity function acquisition module
And the strength function acquisition module is used for parameterizing the strength function of the traditional Hox process according to the global embedding vector and the local mode hiding vector. Here, the intensity function obtaining module models the continuous dynamics of the cascade diffusion using the following conditional intensity functions:
Figure BDA0003801224340000153
wherein, theta represents the parameter of the model, and the forwarding time t is defined in the interval [ t ] j ,t j+1 ) Where f (x) = β lg (1 + exp (x/β)) is the softplus function, and the parameter β is used to constrain the strength function to be positive; the first current represents the evolution process of the continuous time domain, and alpha controls the importance degree of interpolation nodes in two timestamps; w is a 1 、w 2 The learning weights of the global dependency and the local mode at the time t are respectively, and lambda (t) represents the inferred text arrival rate at the time t.
(5) Popularity prediction module
And the popularity prediction module is used for generating a popularity prediction result by combining the integral of the Hox process and the full-connection processing result of the global embedding and local mode hiding vectors. Here, the final popularity is predicted according to the following formula
Figure BDA0003801224340000154
Figure BDA0003801224340000155
The first term Λ represents the integral of the hokes process, representing the desired magnitude of popularity; t is t o Represents the observation time; t is t p Representing a predicted time; the second term represents the output result of the global embedding and local pattern hidden vector splicing result after passing through the second output network (here, the fully connected neural network FCNN).
Example 2
The embodiment is a further improvement on the embodiment 1.
The information cascade prediction system based on the transform enhanced hoxon process provided by the embodiment further includes an optimization module, which is used for parameter optimization in a system training process.
Assume that there are N information cascade label samples for training, { C 1 ,C 2 ,…,C N And the optimization module optimizes the loss of the popularity degree and the maximization of the log-likelihood of all information cascades by minimizing Mean Square Logarithm Error (MSLE):
Figure BDA0003801224340000161
y k representing the true popularity of the kth information concatenation label exemplar,
Figure BDA0003801224340000162
and the predicted popularity corresponding to the kth information concatenation label sample is represented.
The Adam gradient descent algorithm is then used to optimize the system network parameters.
Example 3
The present embodiment provides an information cascade prediction method based on a transform enhanced hokes process, which uses the information cascade prediction system of claim 1, and comprises the following steps:
s1, acquiring user position-wise embedding and time coding, and taking the user position-wise embedding and the time coding as user embedding.
In the step, a user embedding module is used for respectively acquiring a sharing position-wise embedding UV and a fixed time code Z, and a final user embedding X = UV + Z is obtained.
S2, for each user node in user embedding, distinguishing all the precursor nodes as a main path node and a non-main path node, respectively obtaining long-term dependence attention scores of the main path node and the non-main path node, further obtaining the global dependence of each node, and then obtaining the global embedding of all the nodes through full connection processing.
Firstly, distinguishing all predecessor nodes as a main path node and a non-main path node for each user node in user embedding by utilizing a path perception attention layer, and acquiring attention probability matrixes of the main path node and the non-main path node; then constructing global dependence of all nodes according to the attention probability matrix of the main path and the non-main path of each node; and then the global embedding of all the nodes is obtained through full connection processing of the first output network.
S3, acquiring a local mode of each node according to a set time slice window, and then performing pooling operation on the local mode of each node to obtain pooling representation of each node; then, based on the masking attention mechanism, a local mode hidden vector for each node is generated.
Firstly, utilizing a local mode acquisition submodule to intercept a part positioned in front of each node from global dependence according to a set time slice window to obtain a local mode of each node; then, performing pooling operation on the local mode of each node through a pooling submodule to obtain pooling representation of each node; and dynamically aggregating each node and the local mode pooling representation of the nodes before the node by using the mask attention layer to generate the hidden vector of the local mode of each node.
S4 is used to parameterize the strength function of the hokes process in terms of global embedding and local pattern concealment vectors.
Here, the strength function of the traditional hokes process is parameterized by using a strength function acquisition module according to the global embedding and local mode hiding vectors, and the strength function of the Hawkesformer model is obtained.
And S5, integrating the Hox process and the full-connection processing result of the global embedding vector and the local mode hiding vector to generate a popularity prediction result.
Here, the popularity prediction module combines the integral of the hokes process and the full-concatenation processing result of the global embedding and local pattern hiding vectors to generate a popularity prediction result.
Application example
The present application uses three different authentic datasets (Twitter, weibo, and APS, first dataset origin reference [ Lilian Weng, filippo Menczer, and Yong-YooloAhn.2013. Visual prediction and community structure in social networks. Scientific Reports3 (2013) ] second dataset origin reference [ Qi Cao, huawei Shen, keting Cen, wentao Ouyang, and Xueqi Cheng.2017. Deep-Hawkeys: bridging the gateway prediction and interpretation of information sources. CIKM.1149-1158 ] third dataset origin reference/transitions.
The three data sets are divided into a training set, a verification set and a test set according to the proportion of 70%, 15% and 15% respectively.
Firstly, training and optimizing the information cascade prediction system (Hawkesformer) based on the transform enhanced hokes process provided in embodiment 2 by using a training set, and verifying the trained system by using a verification set every time the training is finished until the loss function of the system is reached
Figure BDA0003801224340000171
And tends to be stable.
Then, the trained Hawkesformer is tested according to the information cascade prediction method based on the transform enhanced hokes process provided in example 3, and compared with 7 different baseline models (Feature-Deep, deep hawkes, casCN, forrest, vaCas, tempCas, hawkesformer), and a common measurement method MSLE is adopted (the smaller the value, the better the prediction effect).
Table 1: performance comparison of Hawkesformer with baseline model on three datasets
Figure BDA0003801224340000172
Note: the bold values indicate better performance than other settings. All experimental results were best in 10 trials with statistical significance ρ <0.05.
The 7 baseline methods in table 1 are presented below:
Feature-Deep: structural features, temporal features, user features are extracted and input into the two-layer MLP. [ Xu, xovee, et al. "Casflow: expanding resonant structures and propagation unreserved for casting prediction" IEEE Transactions on Knowledge and Data Engineering (2021) "]
DeepHawkes: the deep learning and the Hawkes self-excitation point process are combined, and the gap between the prediction performance and the interpretability is closed. [ Qi Cao, huawei Shen, keting Cen, wentao Ouyang, and Xueqi Cheng.2017.Deep-Hawkes: bridging the gap between prediction and interpretation of information and casting. InCIKM.1149-1158 ]
And (3) CasCN: is a hybrid model, which uses RNN to simulate the evolution process of diffusion by stacking GNNs to obtain node embedding. [ Chen, xueqin, et al. "Information differentiation prediction video recording cassettes conjugation." 2019IEEE 35th international conjugation on data engineering (ICDE). IEEE,2019 ]
FOREST: the multi-scale cascade prediction problem is handled in combination with reinforcement learning and RNN. "Multi-scale Information Diffusion with a Reinforced Recurrent networks" IJCAI.2019 "by Yang, cheng, et al
VaCas: the hierarchical diffusion model and the space-time structure characteristics of information cascade are integrated, and diffusion uncertainty is captured. [ Fan Zhou, xovee Xu, kunpeng Zhang, gocetrajcevski, and Ting Zhong.2020.Variation information diffusion for probabilistic cases compression. ININFOCOM.1618-1627 ]
The TempCas: a heuristic approach is implemented using RNNs to embed the cascade graphs and use LSTM on specially designed attention-based CNNs to learn historical short-term variation trends. "proceeding of the AAAI Conference on Artificial Intelligence insight. Vol.35.No. 1.2021" ]
From the experimental results in table 1, it can be seen that the information cascade prediction system based on the transform enhanced hokes process provided by the invention can greatly improve the accuracy of popularity prediction compared with other baseline models.
In addition, in order to study the effectiveness of the global dependency module provided by the present invention in capturing the long-term diffusion dependency relationship between nodes in the dynamic diffusion topology, the present application example further obtains an attention probability matrix a, as shown in fig. 5. The stepwise analysis was performed as follows:
(1) For smaller (13 users) information concatenation C 1 The information cascade prediction system provided by the invention focuses more on the main path and early users participating in diffusion. For each row of matrix A, we can see that the users on the main path usually have the most prominent color, which verifies our dominant/non-dominant hypothesis in the information propagation (a similar phenomenon is also shown in cascade C) 2 And C 3 ). For cascade C 1 Node 9, hawkesformer detects that 0 → 2 → 9 is the primary path, indicating that node 0 and node 2 have a greater impact on node 9, while other nodes on the non-primary path have a lesser impact on the forwarding behavior of node 9. For node 8, the hawkesformer detects that 0 → 4 → 8 is the primary path, indicating that node 0 and node 4 have a greater impact on node 8, while other nodes on the non-primary node have a lesser impact on the forwarding behavior of node 8.
(2) Information concatenation C for medium (58 users) 2 Long-term dependencies between the root user and subsequent nodes are observed. This suggests that long-term dependencies between nodes in a dynamic diffusion topology need to be modeled, because sequence models such as RNNs are difficult to train on long sequences,and when the sequence is too long it faces the gradient disappearance/explosion problem. In addition, several short-term bursts were observed, which again validates the assumption of the present invention to learn local patterns for information concatenation.
(3) At the same time, a virus transmission cascade C is shown 3 It has 106 users and a maximum depth of 8. Most pushups in this cascade (84%) are from indirect users to the root. We have found that even "tail" users at the end of the diffusion tree may trigger outbreaks (e.g., users 19 and 43). Some users act as hubs to connect other influential users bringing them into the information dissemination process.
Therefore, the information cascade prediction system based on the transform enhanced Hooke process, which is provided by the invention, not only can improve the accuracy of popularity prediction, but also has strong interpretability. Experiments on three real data sets demonstrate the superior performance of the present invention over the most advanced baseline model. The improvement of the performance of the proposed scheme shows that the combination of the advantages of deep learning (Transformer) and a probability model (hokes process) is a new direction for effectively simulating the information cascade diffusion process and improving the prediction accuracy.
In summary, the modeling mode of the information propagation process is important for understanding the information propagation mechanism and the popularity prediction task, the evolution of information diffusion presents a Directed Acyclic Graph (DAG), and the topological features and time are coupled in nature, but the two most important features (topological features and time features) of the cascade are modeled separately in the existing research, so that cross-domain information is lost, and the expression and prediction capability of the model is reduced. The method links a multi-level attention mechanism with the Hox self-excitation point process to realize more effective and reasonable modeling of information cascade so as to achieve more accurate popularity prediction. In particular, using a two-layer attention structure to parameterize the intensity function of the hokes process has been achieved to efficiently acquire representation knowledge from a continuous time domain.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims (10)

1.一种基于Transformer增强霍克斯过程的信息级联预测系统,其特征在于,包括:1. An information cascade prediction system based on Transformer enhanced Hawkes process, characterized in that, comprising: 用户嵌入模块,用于获取用户position-wise嵌入和时间编码,并以用户position-wise嵌入和时间编码作为用户嵌入;The user embedding module is used to obtain user position-wise embedding and time code, and use user position-wise embedding and time code as user embedding; 全局依赖模块,用于对用户嵌入中每个用户节点,区分所有前驱节点为主路径节点和非主路径节点,并获取主路径节点和非主路径节点的注意力概率矩阵,进而获取每个节点的全局依赖,再经全连接得到所有节点的全局嵌入;The global dependency module is used to distinguish all precursor nodes from main path nodes and non-main path nodes for each user node in user embedding, and obtain the attention probability matrix of main path nodes and non-main path nodes, and then obtain each node The global dependence of , and then get the global embedding of all nodes through full connection; 局部依赖模块,用于依据设定的时间片窗口,获取每个节点的局部模式,然后对每个节点的局部模式进行池化操作得到每个节点的池化表示;最后基于掩码注意力机制,生成每个节点局部模式的隐藏向量;The local dependency module is used to obtain the local pattern of each node according to the set time slice window, and then perform pooling operation on the local pattern of each node to obtain the pooled representation of each node; finally, based on the mask attention mechanism , generating hidden vectors for each node’s local pattern; 强度函数获取模块,用于依据全局嵌入和局部模式隐藏向量来参数化霍克斯过程的强度函数;An intensity function acquisition module for parameterizing the intensity function of the Hawkes process according to the global embedding and the local pattern hidden vector; 流行度预测模块,用于结合霍克斯过程的积分,以及全局嵌入和局部模式隐藏向量的全连接处理结果生成流行度预测结果。The popularity prediction module is used to combine the integral of the Hawkes process, and the fully connected processing results of the global embedding and the local pattern hidden vector to generate the popularity prediction result. 2.根据权利要求1所述的基于Transformer增强霍克斯过程的信息级联预测系统,其特征在于,所述用户嵌入模块包括Position-wise嵌入获取子模块和时间编码获取子模块;2. the information cascading prediction system based on Transformer enhanced Hawkes process according to claim 1, is characterized in that, described user embedding module comprises Position-wise embedding acquisition submodule and time code acquisition submodule; 所述Position-wise嵌入获取子模块,包括共享矩阵U,在一个级联中,对于第j次转发的用户uj,让vj代表它在共享矩阵U上的one-hot索引向量,那么用户uj的position-wise嵌入就由Uvj索引得到,其中V代表所有用户嵌入的one-hot索引集合;所述Position-wise嵌入获取子模块得到的用户Position-wise嵌入为UV;The Position-wise embedding acquisition sub-module includes the shared matrix U. In a cascade, for the user u j forwarded for the jth time, let v j represent its one-hot index vector on the shared matrix U, then the user The position-wise embedding of u j is obtained by Uv j index, wherein V represents the one-hot index set embedded by all users; the user Position-wise embedding obtained by the Position-wise embedding acquisition sub-module is UV; 时间编码获取子模块利用时间编码z来表征序列位置,定义如下:The time code acquisition submodule uses the time code z to represent the sequence position, which is defined as follows:
Figure FDA0003801224330000011
Figure FDA0003801224330000011
其中
Figure FDA0003801224330000012
是时间编码,D是时间编码的维度,[z(tj)]i代表编码第i个维度的值,
Figure FDA0003801224330000013
是时间编码的集合;
in
Figure FDA0003801224330000012
is the time encoding, D is the dimension of the time encoding, [z(t j )] i represents the value of the i-th dimension of the encoding,
Figure FDA0003801224330000013
is a collection of time codes;
用户嵌入最终记为:X=UV+Z。The user embedding is finally written as: X=UV+Z.
3.根据权利要求1所述的基于Transformer增强霍克斯过程的信息级联预测系统,其特征在于,所述全局依赖模块包括:3. the information cascading prediction system based on Transformer enhanced Hawkes process according to claim 1, is characterized in that, described global dependence module comprises: 路径感知注意力层,用于对用户嵌入中每个用户节点,区分所有前驱节点为主路径节点和非主路径节点,并获取主路径节点和非主路径节点的长期依赖注意力分数,构建主路径和非主路径的注意力概率矩阵;再依据每个节点的主路径和非主路径的注意力概率矩阵,获取其全局依赖;The path-aware attention layer is used for each user node in the user embedding, distinguishing all the precursor nodes from the main path nodes and non-main path nodes, and obtaining the long-term dependent attention scores of the main path nodes and non-main path nodes, and constructing the main path node. The attention probability matrix of the path and non-main path; and then obtain its global dependence according to the attention probability matrix of the main path and non-main path of each node; 第一输出网络,用于对每个节点的全局依赖进行全连接处理,得到所有节点的全局嵌入。The first output network is used to perform full-connection processing on the global dependencies of each node to obtain the global embedding of all nodes. 4.根据权利要求3所述的基于Transformer增强霍克斯过程的信息级联预测系统,其特征在于,首先,基于路径感知注意力层,通过以下公式计算出当前节点分别在主路径、非主路径上的注意力分数:4. the information cascading prediction system based on Transformer enhanced Hawkes process according to claim 3, it is characterized in that, at first, based on the path-aware attention layer, the following formula is used to calculate the current node respectively in the main path, non-main path Attention scores on paths: Pscore=(WPX)T(WKX)⊙MP P score =(W P X) T (W K X)⊙M P Nscore=(WNX)T(WKX)⊙MN N score =(W N X) T (W K X)⊙M N 其中MP
Figure FDA0003801224330000021
分别用来保留主路径和非主路径上节点的注意力分数的one-hot矩阵,如果
Figure FDA0003801224330000029
Figure FDA0003801224330000028
表示主路径节点集合,则元素
Figure FDA00038012243300000210
相应的,如果
Figure FDA00038012243300000211
Figure FDA00038012243300000212
表示非主路径节点集合,
Figure FDA0003801224330000022
最终,主路径和非主路径的注意力概率矩阵推导为:
where M P ,
Figure FDA0003801224330000021
The one-hot matrix used to retain the attention scores of nodes on the main path and non-main path, respectively, if
Figure FDA0003801224330000029
Figure FDA0003801224330000028
Represents the main path node set, the element
Figure FDA00038012243300000210
Correspondingly, if
Figure FDA00038012243300000211
Figure FDA00038012243300000212
Represents the set of non-primary path nodes,
Figure FDA0003801224330000022
Finally, the attention probability matrix of the main path and non-main path is derived as:
Figure FDA0003801224330000023
Figure FDA0003801224330000023
其中,D是输入嵌入的维数;掩码矩阵
Figure FDA0003801224330000024
用来防止系统偷窥未来的数据,如果i<j,Mi,j=0,否则Mi,j=-∞;矩阵A中的每个列向量代表uj对其先驱节点
Figure FDA00038012243300000213
的注意力,包括主路径
Figure FDA00038012243300000215
和非主路径上的节点
Figure FDA00038012243300000214
where D is the dimensionality of the input embedding; the mask matrix
Figure FDA0003801224330000024
It is used to prevent the system from peeping future data. If i<j, M i,j =0, otherwise M i,j =-∞; each column vector in matrix A represents u j to its pioneer node
Figure FDA00038012243300000213
attention, including the main path
Figure FDA00038012243300000215
and nodes on non-primary paths
Figure FDA00038012243300000214
然后,通过用户全局依赖提取子模块得到每个节点全局依赖的矩阵形式:E=(WVX)A;Then, the matrix form of each node's global dependency is obtained through the user's global dependency extraction submodule: E=(W V X)A; 通过第一输出网络对E进行全连接,得到所有级联中节点的全局嵌入:E is fully connected through the first output network to obtain the global embedding of nodes in all cascades:
Figure FDA0003801224330000025
Figure FDA0003801224330000025
h(tj)=H(:,j)h(t j )=H(:,j) 其中,
Figure FDA0003801224330000026
是神经网络的参数;h(tj)是
Figure FDA0003801224330000027
中的第j列,代表用户uj的长期依赖表示。
in,
Figure FDA0003801224330000026
is the parameter of neural network; h(t j ) is
Figure FDA0003801224330000027
The jth column in represents the long-term dependence of user u j .
5.根据权利要求1至4任一项所述的基于Transformer增强霍克斯过程的信息级联预测系统,其特征在于,所述局部依赖模块包括:5. The information cascading prediction system based on Transformer enhanced Hawkes process according to any one of claims 1 to 4, characterized in that, said local dependence module comprises: 局部模式获取子模块,用于依据设定的时间片窗口,从全局依赖中截取位于每个节点之前的部分,得到每个节点的局部模式;The local mode acquisition sub-module is used to intercept the part before each node from the global dependency according to the set time slice window, and obtain the local mode of each node; 池化子模块,用于对每个节点的局部模式进行池化操作得到每个节点的池化表示;The pooling sub-module is used to perform a pooling operation on the local pattern of each node to obtain a pooling representation of each node; 掩码注意力层,用于动态聚合每个节点和与其之前节点的局部模式池化表示,生成每个节点局部模式的隐藏向量。The masked attention layer is used to dynamically aggregate the local pattern pooling representation of each node and its previous nodes to generate the hidden vector of each node's local pattern. 6.根据权利要求5所述的基于Transformer增强霍克斯过程的信息级联预测系统,其特征在于,利用掩码注意力层动态聚合当前时间tj之前所有的局部模式嵌入,生成一个隐藏向量来概括所有先前模式的演化过程:6. the information cascade prediction system based on Transformer enhanced Hawkes process according to claim 5, is characterized in that, utilizes mask attention layer to dynamically aggregate all local pattern embeddings before current time tj , generates a hidden vector To summarize the evolution of all previous schemas:
Figure FDA0003801224330000031
Figure FDA0003801224330000031
g(·)是一个线性变换函数,γ={v(t0),v(t1),…,v(tj),…,v(tL)}是演化速率的局部依赖的集合;相似函数f(·,·)指定为:g( ) is a linear transformation function, γ={v(t 0 ),v(t 1 ),…,v(t j ),…,v(t L )} is a collection of local dependencies of the evolution rate; The similarity function f(·,·) is specified as:
Figure FDA0003801224330000032
Figure FDA0003801224330000032
7.根据权利要求6所述的基于Transformer增强霍克斯过程的信息级联预测系统,其特征在于,强度函数获取模块使用以下条件强度函数对级联扩散的连续动态进行建模:7. The information cascade prediction system based on Transformer enhanced Hawkes process according to claim 6, wherein the intensity function acquisition module uses the following conditional intensity function to model the continuous dynamics of cascade diffusion:
Figure FDA0003801224330000033
Figure FDA0003801224330000033
其中,Θ代表模型的参数,转发时间t定义在区间[tj,tj+1),其中f(x)=βlg(1+exp(x/β))是softplus函数,参数β用于约束强度函数为正;第一项current表示连续时域的演化过程,α控制两个时间戳内的插值节点的重要程度;w1、w2分别是全局依赖和局部模式在t时刻的学习权重,λ(t)表示t时刻的推文到达率。Among them, Θ represents the parameters of the model, and the forwarding time t is defined in the interval [t j ,t j+1 ), where f(x)=βlg(1+exp(x/β)) is the softplus function, and the parameter β is used to constrain The intensity function is positive; the first term current represents the evolution process in the continuous time domain, and α controls the importance of the interpolation nodes within the two time stamps; w 1 and w 2 are the learning weights of global dependencies and local patterns at time t, respectively, λ(t) represents the arrival rate of tweets at time t.
8.根据权利要求7所述的基于Transformer增强霍克斯过程的信息级联预测系统,其特征在于,流行度预测模块按照以下公式来预测最终的流行度
Figure FDA0003801224330000034
8. the information cascade prediction system based on Transformer enhanced Hawkes process according to claim 7, is characterized in that, popularity prediction module predicts final popularity according to the following formula
Figure FDA0003801224330000034
Figure FDA0003801224330000035
Figure FDA0003801224330000035
第一项Λ表示霍克斯过程的积分,表示期望的流行度大小;to表示观测时间;tp表示预测时间;第二项表示将全局嵌入和局部模式隐藏向量拼接结果经第二输出网络后的输出结果。The first term Λ represents the integral of the Hawkes process, which represents the expected popularity; t o represents the observation time; t p represents the prediction time; the second term represents the splicing result of the global embedding and local pattern hidden vector through the second output network After the output results.
9.根据权利要求8所述的基于Transformer增强霍克斯过程的信息级联预测系统,其特征在于,还包括优化模块,用于系统训练过程中的参数优化;优化模块通过最小化均方对数误差对流行度的损失和最大化所有信息级联的对数似然进行优化:9. the information cascade prediction system based on Transformer enhanced Hawkes process according to claim 8, is characterized in that, also comprises optimization module, is used for the parameter optimization in system training process; Optimization module is by minimizing mean square pair The logarithmic error optimizes the loss of popularity and maximizes the log-likelihood of all information cascades:
Figure FDA0003801224330000041
Figure FDA0003801224330000041
yk表示第k条信息级联标签样本的真实流行度,
Figure FDA0003801224330000042
表示第k条信息级联标签样本对应的预测流行度,l(Ck)表示一条级联Ck的联合似然,N表示信息级联标签样本数量。
y k represents the true popularity of the kth information cascade label sample,
Figure FDA0003801224330000042
Indicates the predicted popularity corresponding to the kth information cascade label sample, l(C k ) indicates the joint likelihood of a cascade C k , and N indicates the number of information cascade label samples.
10.一种基于Transformer增强霍克斯过程的信息级联预测方法,其包括以下步骤:10. An information cascade prediction method based on Transformer enhanced Hawkes process, which comprises the following steps: S1获取用户position-wise嵌入和时间编码,并以用户position-wise嵌入和时间编码作为用户嵌入;S1 obtains user position-wise embedding and time code, and uses user position-wise embedding and time code as user embedding; S2对用户嵌入中每个用户节点,区分所有前驱节点为主路径节点和非主路径节点,并分别获取主路径节点和非主路径节点的长期依赖注意力分数,进而获取每个节点的全局依赖,再经全连接处理得到所有节点的全局嵌入;S2 For each user node in the user embedding, distinguish all the precursor nodes as main path nodes and non-main path nodes, and obtain the long-term dependency attention scores of the main path nodes and non-main path nodes respectively, and then obtain the global dependency of each node , and then get the global embedding of all nodes through full connection processing; S3依据设定的时间片窗口,获取每个节点的局部模式,然后对每个节点的局部模式进行池化操作得到每个节点的池化表示;然后基于掩码注意力机制,生成每个节点的局部模式隐藏向量;S3 obtains the local pattern of each node according to the set time slice window, and then performs pooling operation on the local pattern of each node to obtain the pooled representation of each node; then generates each node based on the mask attention mechanism The local pattern hidden vector of ; S4用于依据全局嵌入和局部模式隐藏向量来参数化霍克斯过程的强度函数;S4 is used to parameterize the strength function of the Hawkes process in terms of the global embedding and the local pattern hidden vector; S5用于结合霍克斯过程的积分,以及全局嵌入和局部模式隐藏向量的全连接处理结果生成流行度预测结果。S5 is used to combine the integral of the Hawkes process, and the fully connected processing results of the global embedding and the local pattern hidden vector to generate the popularity prediction result.
CN202210983826.9A 2022-08-17 2022-08-17 Information Cascade Prediction System and Method Based on Transformer Enhanced Hawkes Process Active CN115409155B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210983826.9A CN115409155B (en) 2022-08-17 2022-08-17 Information Cascade Prediction System and Method Based on Transformer Enhanced Hawkes Process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210983826.9A CN115409155B (en) 2022-08-17 2022-08-17 Information Cascade Prediction System and Method Based on Transformer Enhanced Hawkes Process

Publications (2)

Publication Number Publication Date
CN115409155A true CN115409155A (en) 2022-11-29
CN115409155B CN115409155B (en) 2026-01-09

Family

ID=84159909

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210983826.9A Active CN115409155B (en) 2022-08-17 2022-08-17 Information Cascade Prediction System and Method Based on Transformer Enhanced Hawkes Process

Country Status (1)

Country Link
CN (1) CN115409155B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115759482A (en) * 2022-12-29 2023-03-07 人民网股份有限公司 Social media content propagation prediction method and device
CN115858899A (en) * 2022-12-14 2023-03-28 中国人民解放军国防科技大学 Network event label popularity prediction method based on multi-label influence
CN116596025A (en) * 2023-04-17 2023-08-15 苏州大学 A method and system for predicting information popularity based on cascading relationship diversity
CN116842398A (en) * 2023-06-27 2023-10-03 哈尔滨工业大学 A topic-aware information forwarding prediction method and system based on shielded self-attention network
CN117268763A (en) * 2023-09-21 2023-12-22 安徽大学 A rolling bearing life prediction method and system
CN119005321A (en) * 2024-10-23 2024-11-22 中国计量大学 Electromechanical equipment predictive maintenance method integrating time sequence knowledge graphs
CN119337704A (en) * 2024-09-23 2025-01-21 中国船舶集团有限公司第七一九研究所 Online opinion evolution simulation method and device based on offline event incentive model
CN119558490A (en) * 2025-01-27 2025-03-04 北京邮电大学 A method and system for predicting early information dissemination based on diffusion model
CN119558751A (en) * 2024-12-02 2025-03-04 深圳大学 A supply chain distribution path planning method based on reinforcement learning
CN119692524A (en) * 2023-11-14 2025-03-25 电子科技大学 A multimodal social media popularity prediction method based on time-aware hypergraph learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160371598A1 (en) * 2015-06-18 2016-12-22 TCL Research America Inc. Unified attractiveness prediction framework based on content impact factor
CN114037549A (en) * 2021-10-09 2022-02-11 北京交通大学 Online generated content popularity prediction method based on neighbor perception representation learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160371598A1 (en) * 2015-06-18 2016-12-22 TCL Research America Inc. Unified attractiveness prediction framework based on content impact factor
CN114037549A (en) * 2021-10-09 2022-02-11 北京交通大学 Online generated content popularity prediction method based on neighbor perception representation learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张志扬;张凤荔;陈学勤;王瑞锦;: "基于分层注意力的信息级联预测模型", 计算机科学, no. 06, 15 June 2020 (2020-06-15) *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115858899A (en) * 2022-12-14 2023-03-28 中国人民解放军国防科技大学 Network event label popularity prediction method based on multi-label influence
CN115759482A (en) * 2022-12-29 2023-03-07 人民网股份有限公司 Social media content propagation prediction method and device
CN116596025A (en) * 2023-04-17 2023-08-15 苏州大学 A method and system for predicting information popularity based on cascading relationship diversity
CN116842398A (en) * 2023-06-27 2023-10-03 哈尔滨工业大学 A topic-aware information forwarding prediction method and system based on shielded self-attention network
CN117268763A (en) * 2023-09-21 2023-12-22 安徽大学 A rolling bearing life prediction method and system
CN117268763B (en) * 2023-09-21 2024-09-27 安徽大学 A rolling bearing life prediction method and system
CN119692524A (en) * 2023-11-14 2025-03-25 电子科技大学 A multimodal social media popularity prediction method based on time-aware hypergraph learning
CN119337704A (en) * 2024-09-23 2025-01-21 中国船舶集团有限公司第七一九研究所 Online opinion evolution simulation method and device based on offline event incentive model
CN119337704B (en) * 2024-09-23 2025-10-31 中国船舶集团有限公司第七一九研究所 Online viewpoint evolution simulation method and device based on offline event excitation model
CN119005321A (en) * 2024-10-23 2024-11-22 中国计量大学 Electromechanical equipment predictive maintenance method integrating time sequence knowledge graphs
CN119005321B (en) * 2024-10-23 2025-04-25 中国计量大学 Electromechanical equipment predictive maintenance method integrating time sequence knowledge graphs
CN119558751A (en) * 2024-12-02 2025-03-04 深圳大学 A supply chain distribution path planning method based on reinforcement learning
CN119558490A (en) * 2025-01-27 2025-03-04 北京邮电大学 A method and system for predicting early information dissemination based on diffusion model

Also Published As

Publication number Publication date
CN115409155B (en) 2026-01-09

Similar Documents

Publication Publication Date Title
CN115409155A (en) Information cascade prediction system and method based on Transformer enhanced Hooke process
Ma et al. Learn to forget: Machine unlearning via neuron masking
Yuan et al. Jointly embedding the local and global relations of heterogeneous graph for rumor detection
Luo et al. Privacy-preserving clustering federated learning for non-IID data
CN113919441A (en) Classification method based on hypergraph transformation network
CN113850399B (en) A method for federated learning membership inference based on prediction confidence sequence
CN114116995B (en) Session recommendation method, system and medium based on enhanced graph neural network
CN115660147A (en) A method and system for predicting information dissemination based on influence modeling between propagation paths and within propagation paths
CN114048328B (en) Knowledge-graph link prediction method and system based on conversion hypothesis and message transmission
CN110083748A (en) A kind of searching method based on adaptive Dynamic Programming and the search of Monte Carlo tree
CN115309647A (en) Federal learning-based software defect prediction privacy protection method
Xie et al. L-BGNN: Layerwise trained bipartite graph neural networks
Zhao et al. Predicting information diffusion via deep temporal convolutional networks
CN119005403A (en) Information propagation multitasking prediction method, equipment, medium and product
Yu et al. Influence-aware graph neural networks
Che et al. Across-platform detection of malicious cryptocurrency accounts via interaction feature learning
Feng et al. Multi-level contrastive learning on weak social networks for information diffusion prediction
CN116523118B (en) Multi-source information propagation prediction method and system based on heterogeneous graph neural network
CN114360730A (en) Microbe-disease association prediction method based on multi-view graph convolutional network
CN111259264B (en) Time sequence scoring prediction method based on generation countermeasure network
CN112884045A (en) Classification method of random edge deletion embedded model based on multiple visual angles
CN119962727B (en) Information propagation prediction method and system based on hypergraph neural network combined with text
Chen et al. Community Detection Based on DeepWalk Model in Large‐Scale Networks
Xu et al. Adaptive sparse federated learning in large output spaces via hashing
Yang et al. Session-based recommendation with graph neural networks for repeat consumption

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant