Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
CN115409155B - Information Cascade Prediction System and Method Based on Transformer Enhanced Hawkes Process - Google Patents
[go: Go Back, main page]

CN115409155B - Information Cascade Prediction System and Method Based on Transformer Enhanced Hawkes Process - Google Patents

Information Cascade Prediction System and Method Based on Transformer Enhanced Hawkes Process

Info

Publication number
CN115409155B
CN115409155B CN202210983826.9A CN202210983826A CN115409155B CN 115409155 B CN115409155 B CN 115409155B CN 202210983826 A CN202210983826 A CN 202210983826A CN 115409155 B CN115409155 B CN 115409155B
Authority
CN
China
Prior art keywords
node
embedding
user
main path
global
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210983826.9A
Other languages
Chinese (zh)
Other versions
CN115409155A (en
Inventor
周帆
余柳
钟婷
徐增
匡平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210983826.9A priority Critical patent/CN115409155B/en
Publication of CN115409155A publication Critical patent/CN115409155A/en
Application granted granted Critical
Publication of CN115409155B publication Critical patent/CN115409155B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/40Business processes related to social networking or social networking services
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种基于Transformer增强霍克斯过程的信息级联预测系统及方法,包括用户嵌入模块、全局依赖模块、局部依赖模块、强度函数获取模块和流行度预测模块,首先获取用户position‑wise嵌入和时间编码作为用户嵌入,然后以拓扑结构的视角,引入路径感知假设,并从全局嵌入和局部模式两个角度设计两层注意力层,参数化霍克斯过程的强度函数,并结合霍克斯过程、全局嵌入和局部模式,学习信息级联扩散过程耦合的时间和拓扑随机特性,以进行流行度预测;本发明扩展了传统的霍克斯过程,并有效地从连续时间域中获取知识,提升流行度预测准确性。

This invention discloses an information cascading prediction system and method based on a Transformer-enhanced Hawkes process, comprising a user embedding module, a global dependency module, a local dependency module, an intensity function acquisition module, and a popularity prediction module. First, user position-wise embeddings and temporal encodings are acquired as user embeddings. Then, from a topological perspective, a path-aware hypothesis is introduced, and two attention layers are designed from the perspectives of global embedding and local patterns. The intensity function of the Hawkes process is parameterized, and by combining the Hawkes process, global embedding, and local patterns, the temporal and topological stochastic characteristics of the information cascading diffusion process are learned for popularity prediction. This invention extends the traditional Hawkes process and effectively acquires knowledge from the continuous time domain, improving the accuracy of popularity prediction.

Description

Information cascade prediction system and method based on transform reinforced Hox process
Technical Field
The invention belongs to the technical field of information processing, relates to information diffusion (Information diffusion), information cascading (Information Cascade) and popularity prediction (Popularity Prediction) based on deep learning (DEEP LEARNING), and particularly relates to a method and a system (Hawkesformer) for introducing a transducer into a Hoxwell process (Hawkes Process) to perform coupling modeling on cascading time characteristics and topological structures and finally realizing popularity prediction (Cascade Prediction).
Background
Content sharing through Twitter, facebook and social media platforms such as microblogs has become a main channel for individuals to express opinions and respond to various topics. This trend produces a large amount of data describing the way information is propagated and provides unprecedented convenience in generating, delivering, and propagating information items. The initial distribution of information (e.g., tweet) and the subsequent sharing forwarding (e.g., retweet) form a cascade of information representing what is known as an information dissemination process. Similar phenomena are found in other (non-social media) settings, paper quotation, blog space, and email forwarding. The large number of user sharing activities contributes to the rapid and massive propagation of the information cascade. A typical task of information cascading is to predict a certain cascade (tweet, microblog, etc.), which after a certain period of time, is the scale of the potentially affected users, i.e. popularity prediction (size of the information cascade), which can bring great economic and social value in many downstream applications. For example, knowing what type of advertisement is maximally influential may make advertisement delivery more accurate, predicting potential impact users of rumors, allowing administrators to intervene early to avoid serious consequences, etc.
The existing methods are mainly divided into three categories (i) feature-based models, mainly explored feature sets extracted from structures, time sequences, user content and profiles, however, they rely on a large number of hand-made feature projects, which cannot be generalized from one domain to another, (ii) probability generation models, aimed at modeling events, including poisson and hough processes, by classical examples of Time Point Processes (TPPs), but the existing methods make powerful assumptions on the diffusion mechanisms of cascades and have limited learning capacity for large-scale cascade data, (iii) deep learning models, which focus on static time cascades diffusion within specific snapshots in discrete time domains, have little focus on the whole evolution process and fully utilizing all cascade dynamics, mainly using simple event sequence models (e.g. RNNs) to model cascade events, or aided with GNNs to separately model time features and topology features, but in essence both are coupled, such separate modeling approaches lose cross-domain information and reduce the expression and predictive capacity of models.
Disclosure of Invention
Aiming at the problems of limited learning capability, cross-domain information loss and the like of the current information cascade prediction method, the invention aims to provide a novel modeling mode for a cascade diffusion process, summarize the cascade diffusion process into a Directed Acyclic Graph (DAG) which is continuously diffused in a continuous time domain, introduce a transducer (attention mechanism) into a Hooke process (Hawkes Process), and perform coupled modeling on time characteristics and topological characteristics in the continuous time domain by introducing an attention mechanism into the Hooke process, thereby realizing effective modeling of the cascade diffusion process and improving the accuracy of popularity prediction.
The general idea of the invention is to connect two levels of attention modules together to parameterize the intensity function of the hough process. The intensity function ensures the coupling time and topology dependence of the modeling cascade in the continuous time domain. Specifically, at a first level, a global dependency module is designed to dynamically capture long-term diffusion processes of events in the cascade that have occurred in the past, and main/non-main path assumptions are proposed to adaptively integrate the diffusion process onto the underlying DAG. This allows each node participating in the cascade to update its current hidden state at any location. At the second level, a local module is designed to capture the short-term evolution rate of the information cascade by encoding local patterns in a fixed time slice window. Through the design of two levels, we take the representation of the node on the global and local, and finally send a full connection layer to conduct popularity prediction.
Before a particular procedure is given, some basic terms and definitions are given. Given an information cascade C k and its forwarding history over observation time t o Where t j∈(0,to ], L is the number of forwarding within the observation time window (t 0,t0+to), and t 0 denotes the start time. Each triplet (t j,vj,uj) represents that at time t j, user u j forwarded user v j. For example, the forwarding history of fig. 1 is: (t 4,u1,u4),(t5,u3,u5) }, omitted hereinafter for simplicity In addition, we use Representing the set of timestamps preceding the current moment.
Based on the inventive concept, the information cascade prediction system based on the transform enhanced hoxwell process provided by the invention comprises:
The user embedding module (Initial User Embeddings) is used for acquiring the position-wise embedding and the time coding of the user and taking the position-wise embedding and the time coding of the user as the user embedding;
the global dependence module (Global Dependency Module) is used for distinguishing all precursor nodes from main path nodes and non-main path nodes for each user node in the user embedding, acquiring attention probability matrixes of the main path nodes and the non-main path nodes, further acquiring global dependence of each node, and obtaining global embedding of all nodes through full connection;
The local dependency module (Local Dependency Module) is used for acquiring the local mode of each node according to the set time slice window, and then carrying out pooling operation on the local mode of each node to obtain pooled representation of each node;
The intensity function acquisition module is used for parameterizing the intensity function of the Hox process according to the global embedding and local mode hiding vectors;
And the popularity prediction module is used for generating popularity prediction results by combining integration of the Hox process and full-connection processing results of the global embedding and local mode hiding vectors.
The user embedding module is characterized in that the user embedding X consists of position-wise embedding and time coding and is used as the input of a follow-up module (global dependency module).
The user embedding module comprises a Position-wise embedding acquisition sub-module and a time coding acquisition sub-module.
Position-wise embedding acquisition sub-module comprising a shared matrixWhere D is the dimension of user embedding and L is the maximum number of forwarding times. The j-th column of U represents the user of the j-th forwarding, and each cascade of nodes with the same order on the time axis shares an initial embedding. The shared matrix U is first randomly initialized and then subsequent context embedding associated with the underlying cascade is dynamically updated by subsequent global dependency modules. The shared matrix U greatly reduces the computational overhead, making the system more efficient. In one cascade, for user U j forwarded at the j-th time, let v j represent its one-hot index vector on shared matrix U, then the position-wise embedding of user U j is obtained by the Uv j index, whereRepresenting a set of all user-embedded one-hot indexes. Therefore, based on the shared matrix U, the Position-wise embedding acquisition sub-module obtains the Position-wise embedding of the user as UV.
2. A time-code acquisition sub-module (Temporal Encoding), a key component of the invention is a two-level attention module, which characterizes sequence positions with a time code z, since the order dependency is not considered by the attention mechanism, defined as follows:
The temporal coding is a variant from the original transducer absolute position coding. Wherein the method comprises the steps of Is a time code, D is the dimension of the time code, [ z (t j)]i represents the value encoding the ith dimension, i.e. [1, D ]. Is a set of temporal encodings.
Thus, the sequence is forwardedThe initial user-embedding of (c) can ultimately be noted as x=uv+z. The initial user-embedded X consists of a learning-shared position-wise embedded UV together with a fixed time code Z.
The global dependency of the information cascade may be regarded as an indication that a certain user may promote or suppress another user during the long-term diffusion.
The global dependency module includes:
The path perception attention layer is used for distinguishing all precursor nodes from main path nodes and non-main path nodes for each user node in user embedding, acquiring long-term dependent attention scores of the main path nodes and the non-main path nodes, and constructing attention probability matrixes of the main path and the non-main path;
and the first output network is used for carrying out full connection processing on the global dependence of each node to obtain global embedding of all the nodes.
After the user is embedded, a path perception attention layer is designed in the global module to parameterize Hawkesformer the intensity function, which is used for capturing the long-term diffusion dependence between the input cascade nodes, and the user complete diffusion dependence extraction sub-module is utilized to extract the time topology dependence of the coupling of the main path nodes and the non-main path nodes.
Theory of the hox process shows that the final popularity of the information cascade is related to the past participation of the user, which means that the current node affects not only its immediate (immediate) forwarder, but also non-immediate (non-immediate) forwarder by transitivity. In order to represent the effect of each node on the cascading future popularity from a topology perspective, a path-aware assumption is therefore made in the global module, namely a main path/non-main path assumption that the nodes on the main path are considered to have a larger contribution to the current node u j, and the nodes on the non-main path are considered to have a smaller contribution to the arrival of the u j node, but still have an effect on the final popularity, which also conforms to the assumption of the hox process. For the current node u j, all precursor nodes Is divided into two categories (1) nodes on the main path, noted asWhich includes all previous nodes on the entire forwarding path from the previous node u j back to the root node u 0, (2) nodes on the non-primary path, i.e., all nodes except the primary path node, are noted asAn example is given in fig. 2, where for the current nodes u 4,① and ② nodes on the main path are represented, and ③ nodes on the non-main path are represented.
Based on the above explanation, the implementation steps of the global dependency module are as follows:
giving a current node u j to obtain the current node u j and a certain node in the main path Long-term dependent attention score of (2):
Wherein a, b and Representing the user embedding matrix X via a main path Q (Query) matrix, respectivelyNon-primary path Q (Query) matrixAnd K (Key) matrixThe transformed user embeds vectors, and the three transformation matrices are used to distinguish the roles of the users in long-term dependence. Wherein < ·, · > represents the inner product function. Similarly, the current node u j is not a node in the non-primary pathThe long-term dependent attention score of (2) is:
Then, the attention weighted sum is calculated by the following formula, resulting in the complete diffusion dependency e j of user u j:
Finally, the context embedding of each node is obtained according to the attention scores of the main path and the non-main path of each node, wherein E= { E 0,e1,…,ej,…,eL }, and each row in E represents the global dependence of a certain node in the cascade; embedding matrix X via V (Value) matrix on behalf of user The converted user embeds the vector.
The above is calculated for the dependence of the current single user u j. In actual operation, the above attention is relied on to use a matrix parallel manner to accelerate the operation speed. The matrix calculation flow of global dependencies can be referred to fig. 3.
Firstly, based on the path-aware attention layer, the attention scores of the current node on the main path and the non-main path are calculated through the following formulas:
wherein M P is a derivative of the formula, One-hot matrix for preserving the attention scores of nodes on the main path and the non-main path, respectively, ifThen the elementCorrespondingly, ifFinally, the attention probability matrices for the main and non-main paths can be derived as:
where D is the dimension of the input embedding. Mask matrix To prevent the system from peeping future data if i < j, M i,j =0, otherwise M i,j = - ≡. The matrix M forces the softmax function to assign a valid long-term dependent attention only on forwarders before u j, while assigning 0 to subsequent users (i.e., users later than t j activates). Each column vector in matrix a represents u j for its predecessor nodesIncludes the main pathAnd non-primary pathThe upper node.
Then, a matrix form of global dependence of each node is obtained according to the following formula:
E=(WVX)A
For learning more semantic information, the path aware attention layer can also be designed here as a module based on a multi-headed attention mechanism, mapping context embedding to different subspaces, which can be computed in parallel using the matrix multiplication given above, and thus compute a plurality of heads E 1,E2,…,EH, which are then connected to e=w OConcat(E1,E2,…,EH), where Is an aggregation matrix. Finally, we feed E into the first output network (here fully connected neural network FCNN) resulting in all cascaded global embeddings:
h(tj)=H(:,j)
Wherein, the Is a parameter of the neural network, h (t j) isRepresents a long-term dependent representation of user u j.
The local dependence module can be called as local module for short, the increasing rate of the transfer quantity in the information cascade is not stable, and the short-term burst in the propagation process has great influence on the final popularity. To better embed the evolution rate of the information cascade, pooled attention is utilized here to capture local patterns. The pooling operation is performed on the local patterns of each current node based on the user global embedded representation that has been learned in the global dependency module.
The local dependency module includes:
the local mode acquisition sub-module is used for intercepting a part positioned in front of each node from the global dependence according to a set time slice window to obtain a local mode of each node;
chi Huazi module, configured to perform pooling operation on the local mode of each node to obtain pooled representation of each node;
and the mask attention layer is used for dynamically aggregating each node and the local mode pooling representation of the node before the node to generate a hidden vector of the local mode of each node.
Each current node has a local mode, a fixed time slice window t s is set, a local mode acquisition submodule is utilized to intercept the part positioned before each node from the global dependence to obtain the local mode of each node, and therefore, the local mode of the current node consists of the global dependence of the current node u j and part of precursor nodes and is expressed asRepresenting a collection of all local patterns of a cascade of information.
For each current node u j, its local pattern is pooled by the pooling sub-modulePooling operations are performed to aggregate features in time slices t s to generate new representationsThus, all local modesThe overall pooled representation of (a) may be formalized as: corresponds to a pooled representation of the local pattern.
To capture the evolution rate associated with cascading short-term bursts/dips, the present invention incorporates the effects of local patterns through a designed masking attention layer. Based on the mask attention mechanism, all local pattern embeddings (i.e., pooled representations) before the current time t j are dynamically aggregated with the mask attention layer, generating a hidden vector to summarize the evolution of all previous patterns:
g (·) is a linear transformation function, γ= { v (t 0),v(t1),…,v(tj),…,v(tL) } is a set of local dependencies of the evolution rate. The similarity function f (·, ·) is specified as:
The intensity function obtaining module parameterizes the intensity function of the hox process by the global module and the local module to obtain the intensity function λ (t) of the invention Hawkesformer. Given an information cascade C k and its forwarding history over observation time t o Representing a sequence preceding time stamp t j but excluding time t j. Here, the intensity function acquisition module models the continuous dynamics of the cascade diffusion using the following conditional intensity function:
Wherein Θ represents a parameter of the model, the forwarding time t is defined in a section [ t j,tj+1 ], f (x) =βlg (1+exp (x/β)) is a softplus function, the parameter β is used for constraining the intensity function to be positive, a first current represents an evolution process of a continuous time domain, α controls importance of interpolation nodes in two time stamps, w 1、w2 is learning weight of a global dependency and a local mode at time t respectively, and the intensity function λ (t) represents a push arrival rate at time t. Both global and local dependencies of Hawkesformer have expressive and flexible properties, the design of the intensity function being such that Hawkesformer captures the spatio-temporal dynamics of the coupling from a continuous time domain.
The popularity prediction module is mainly used for predicting information popularity. According to Hawkes theory, the survival function at time t is: the function represents a conditional probability that no event has occurred between t n, t. Subsequently, we can derive the probability that an event is observed at any time t':
Finally, by means of the popularity prediction module, the final popularity is predicted according to the following formula in combination with the time point process and the outputs of the two-level attention mechanism modules (global module and local module)
The first term Λ represents the integral of the hough process, the desired popularity, t o the observation time, t p the prediction time, and the second term the output of the global embedded and local pattern hidden vector splice via the second output network (here fully connected neural network FCNN).
The information cascade prediction system based on the transform enhanced Hox process also comprises an optimization module used for parameter optimization in the system training process.
Forwarding time of cascade C k The joint likelihood of (2) can be expressed as:
Wherein the method comprises the steps of Representing a set of times up to time t j.
Assuming that there are N information cascade label samples for training, { C 1,C2,…,CN }, the optimization module optimizes by minimizing the Mean Square Log Error (MSLE) loss of the fluidity and maximizing the log likelihood of all information cascades:
y k represents the true popularity of the kth information cascading label sample, And representing the predictive popularity corresponding to the kth information cascading label sample.
The Adam gradient descent algorithm is then used to optimize the system network parameters.
The invention also provides an information cascade prediction method based on a transform enhanced Hox process, which comprises the following steps:
s1, acquiring a user position-wise embedding and a time code, and taking the user position-wise embedding and the time code as a user embedding;
s2, distinguishing all precursor nodes as main path nodes and non-main path nodes for each user node in the user embedding, respectively acquiring attention probability matrixes of the main path nodes and the non-main path nodes, further acquiring global dependence of each node, and then acquiring global embedding of all nodes through full connection processing;
S3, obtaining a local mode of each node according to a set time slice window, and carrying out pooling operation on the local mode of each node to obtain pooled representation of each node;
S4, parameterizing an intensity function of the Hox process according to the global embedding and the local mode hiding vectors;
s5, integrating the Hox process and generating a popularity prediction result by combining the full connection processing result of the global embedded and local mode hidden vectors.
The present invention thus implements a system and method for modeling and predicting the coupling of time and topology features over a continuous time domain by introducing a two-layer attention mechanism into the hough process. At the first level we distinguish DAGs from non-primary paths in terms of primary paths, which layer is used to capture global dependencies between nodes, i.e. time and topology dependencies of couplings, and at the second level we propose a local pool attention module to embed the evolution rate of the cascade to simulate bursts and dips of short-term forwarding numbers.
Compared with the prior art, the invention has the following beneficial effects:
1. The method comprises the steps of firstly obtaining position-wise embedding and time coding of a user as user embedding, then introducing a path perception assumption from the view angle of a topological structure, designing two layers of attention layers from two angles of global embedding and local mode, parameterizing the intensity function of the Hox process, combining the Hox process, the global embedding and the local mode, learning the time and the topological random characteristics coupled in the information cascade diffusion process so as to conduct popularity prediction.
2. The invention provides a two-level attention architecture which parameterizes the intensity function of a Hox process, firstly designs a global module attention layer based on path perception to capture the long-term diffusion dependency relationship between nodes in a dynamic diffusion topology, and then provides a local module to capture the short-term evolution rate in information cascade.
3. The invention customizes a position-wise user embedding method capable of learning for the information cascade directed acyclic graph, improves the parameter efficiency and reduces the calculation burden;
4. The method is extremely important for understanding the evolution process of the social network platform and explaining the cascade epidemic reasons, for example, the method predicts the forwarding quantity of a certain microblog in a future period, explains the epidemic factors, and can be used for downstream tasks such as marketing design, rumor prediction, influence maximization and the like.
Drawings
FIG. 1 is an information cascade diffusion process presenting a Directed Acyclic Graph (DAG). The modeling mode of the existing model separately models the time feature of the node and the topology feature appearing in the graph, for example, the time feature is fed into RNN, and the topology feature is fed into GNN. But both are in coupled form in nature, the topology in the DAG is determined simultaneously when the forwarding time of a certain node is determined, e.g. u 2's forwarding behavior depends on u 0 instead of its previous node u 1 in time, so the DAG forms a time topology dependency of the coupling, whereas separate modeling cannot capture the dependency of the coupling.
FIG. 2 is a main path and non-main path hypothesis in the global dependency module, which can directly acquire long-term dependencies on the main path (non-main path), unlike the chained structure based on the RNN sequence model.
FIG. 3 is an implementation detail diagram of the primary and non-primary path hypotheses in the global dependency module, i.e., the implementation detail of FIG. 2.
FIG. 4 is a model diagram of an information cascade prediction system based on a transform enhanced Hox process of the present invention, including an initial user embedding, a global dependency module, a local dependency module, and a prediction module.
FIG. 5 is a visual depiction of an attention probability matrix A, representing three cascade-corresponding thermodynamic diagrams (Heatmaps) and cascade structures randomly chosen in Weibo dataset, wherein the diagonal lines in the thermodynamic diagrams represent the attention of the current node to itself, each row represents the attention of the current node to all predecessor nodes, each entry ranges from [0,1], the darker the entry color, the higher the attention, and the sum of each row is 1.
Interpretation of the terms
Information concatenation Information Cascade figure 1 illustrates this process in an example where after a root node publishes a piece of public content, friends and caregivers of the root node will see the public content and forward it one after the other. Thus, public content propagates through the edges of a social network and creates a cascade of information, a typical task of which is to predict the size of a certain cascade (tweets, microblogs, etc.), potentially affected users after a period of observation. The theory basis can be referred to in the literature 【J.Cheng,L.Adamic,P.A.Dow,J.M.Kleinberg,and J.Leskovec.Can cascades be predictedIn Proc.of WWW,2014.】
Hox process (HawkesProcess, HP) the Hox process is a mainstream time point process (Temporal Point Processes, TPPs). HP has three key factors, namely (1) user influence, more contribution of the user with influence to the arrival rate of new forwarding, more forwarding of the message forwarded by the user with influence, a self-excitation mechanism, each forwarding can influence the arrival rate of new forwarding in the future, and (3) a time attenuation effect. The impact of forwarding decays with increasing time. However, conventional HP simplifies the arrival rate of events, models the arrival of events using, for example, exponential or power law functions, and makes strong assumptions about the diffusion mechanisms in cascading events, limiting their learning ability over large-scale cascading data.
Self-attention mechanism (Self-Attention Mechanism) the Self-attention mechanism, a special case of the attention mechanism recently proposed in Transformer【Vaswani,Ashish,et al."Attention is all you need."Advances in neural information processing systems 30(2017).】, has been widely used in various NLP tasks such as machine translation, language modeling, etc., and has produced considerable improvements. It encodes the input sequence by correlating the input token, which consists of the matrices Q (query), K (Keys) and V (Values). The matrix form of the attention function is:
Detailed Description
The invention will be further described with reference to the accompanying drawings.
Example 1
For convenience, in the rest of the invention we set out the Twitter (Twitter) cascade as an example, but the invention is also applicable to other types of information cascades, such as microblogging, academic citations, etc., and we will give examples in the experimental section.
The information cascade prediction system based on the transform enhanced hoxwell process provided in this embodiment, as shown in fig. 4, includes a user embedding module, a global dependency module, a local dependency module, an intensity function obtaining module, and a popularity prediction module.
(1) User embedded module
And the user embedding module is used for acquiring the user position-wise embedding and the time code and taking the user position-wise embedding and the time code as the user embedding.
The user embedding module comprises a Position-wise embedding acquisition sub-module and a time coding acquisition sub-module.
Position-wise embedding acquisition sub-module comprising a shared matrixIn one cascade, for user U j forwarded at the j-th time, let v j represent its one-hot index vector on shared matrix U, then the position-wise embedding of user U j is obtained by the Uv j index, whereRepresenting a set of all user-embedded one-hot indexes. Therefore, based on the shared matrix U, the Position-wise embedding acquisition sub-module obtains the Position-wise embedding of the user as UV.
The time code acquisition submodule characterizes the sequence position by using the time code z, and is defined as follows:
The temporal coding is a variant from the original transducer absolute position coding. Wherein the method comprises the steps of Is a time code, D is the dimension of the time code, [ z (t j)]i represents the value encoding the ith dimension, i.e. [1, D ]. Is a set of temporal encodings.
Thus, the user embedding can ultimately be noted as x=uv+z. The initial user-embedded X consists of a learning-shared position-wise embedded UV together with a fixed time code Z.
The shared matrix U is initially initialized, and its parameters are dynamically updated together by the designed optimization module using back propagation to obtain subsequent context embedding associated with the underlying cascade.
(2) Global dependency module
And the global dependence module is used for distinguishing all the precursor nodes from the main path nodes and the non-main path nodes for each user node in the user embedding, acquiring the attention probability matrixes of the main path nodes and the non-main path nodes, further acquiring the global dependence of each node, and obtaining the global embedding of all the nodes through full connection processing.
The global dependency module includes a path aware attention layer and a first output network.
The path perception attention layer is used for distinguishing all precursor nodes from main path nodes and non-main path nodes for each user node in user embedding, acquiring long-term dependent attention scores of the main path nodes and the non-main path nodes, constructing attention probability matrixes of the main path and the non-main path, and acquiring global dependence according to the attention probability matrixes of the main path and the non-main path of each node.
As shown in fig. 2 and 3, a path aware hypothesis, i.e., a main path/non-main path hypothesis, is set forth in the global module. Giving a current node u j to obtain the current node u j and a certain node in the main pathLong-term dependent attention score of (2):
Wherein a, b and Representing the user embedding matrix X via a main path Q (Query) matrix, respectivelyNon-primary path Q (Query) matrixAnd K (Key) matrixThe transformed user embeds vectors, and the three transformation matrices are used to distinguish the roles of the users in long-term dependence. Wherein < ·, · > represents the inner product function. Similarly, the current node u j is not a node in the non-primary pathThe long-term dependent attention score of (2) is:
Then, the attention weighted sum is calculated by the following formula, resulting in the complete diffusion dependency e j of user u j:
Finally, the global dependencies of all nodes are constructed according to the complete diffusion dependencies of each node. E= { E 0,e1,…,ej,…,eL }, each row in E representing the global dependence of a certain node in the cascade, the context embedding of each node is obtained according to the attention scores of the main path and the non-main path of each node; embedding matrix X via V (Value) matrix on behalf of user The converted user embeds the vector.
The above is calculated for the dependence of the current single user u j. In actual operation, the above attention is relied on to use a matrix parallel manner to accelerate the operation speed.
Based on the path-aware attention layer, the attention scores of the current node on the main path and the non-main path are calculated according to the following formula:
wherein M P is a derivative of the formula, One-hot matrix for preserving the attention scores of nodes on the main path and the non-main path, respectively, ifThen the elementCorrespondingly, ifFinally, the attention probability matrices for the main and non-main paths can be derived as:
where D is the dimension of the input embedding. Mask matrix To prevent the system from peeping future data if i < j, M i,j =0, otherwise M i,j = - ≡. The matrix M forces the softmax function to assign a valid long-term dependent attention only on forwarders before u j, while assigning 0 to subsequent users (i.e., users later than t j activates). Each column vector in matrix a represents u j for its predecessor nodesIncludes the main pathAnd nodes on non-primary paths
Then, a matrix form of global dependence of each node is obtained according to the following formula:
E=(WVX)A
To learn more semantic information, the path aware attention layer is designed here as a module based on a multi-headed attention mechanism, mapping context embedding into different subspaces, which can be computed in parallel using the matrix multiplication given above, and thus compute a plurality of heads E 1,E2,…,EH, which are then connected to e=w OConcat(E1,E2,…,EH), where Is an aggregation matrix. And the first output network is used for carrying out full connection processing on the global dependence of all the nodes to obtain global embedding of all the nodes. In this embodiment, using the fully connected neural network FCNN as the first output network, the global dependencies E of all nodes are fed into the first output network, so as to obtain global embedments of all nodes:
h(tj)=H(:,j)
Wherein, the Is a parameter of the neural network, h (t j) isRepresents a long-term dependent representation of user u j.
(3) Local dependency module
The local dependency module is used for acquiring the local mode of each node according to the set time slice window, carrying out pooling operation on the local mode of each node to obtain pooled representation of each node, and generating a hidden vector of the local mode of each node based on a mask attention mechanism.
The local dependency module includes a local pattern retrieval sub-module, a pooling sub-module, and a mask attention layer.
And the local mode acquisition sub-module is used for intercepting a part positioned in front of each node from the global dependence according to the set time slice window to obtain the local mode of each node. Setting a fixed time slice window t s, utilizing a local mode acquisition submodule to intercept the part positioned before each node from the global dependence to obtain the local mode of each node, so that the local mode of the current node is formed by the global dependence of the current node u j and part of precursor nodes and is expressed asRepresenting a collection of all local patterns of a cascade of information.
And Chi Huazi module, configured to perform pooling operation on the local mode of each node to obtain a pooled representation of each node. For each current node u j, its local pattern is pooled by the pooling sub-modulePooling operations are performed to aggregate features in time slices t s to generate new representationsThus, all local modesThe overall pooled representation of (a) may be formalized as: corresponds to a pooled representation of the local pattern.
And the mask attention layer is used for dynamically aggregating each node and the local mode pooling representation of the node before the node to generate a hidden vector of the local mode of each node. Based on the mask attention mechanism, all local pattern embeddings (i.e., pooled representations) before the current time t j are dynamically aggregated with the mask attention layer, generating a hidden vector to summarize the evolution of all previous patterns:
g (·) is a linear transformation function, y= { v (t 01),v(t1),…,v(tj),…,v(tL) } is a set of local dependencies of evolution rate. The similarity function f (·, ·) is specified as:
(4) Intensity function acquisition module
And the intensity function acquisition module is used for parameterizing the intensity function of the traditional Hox process according to the global embedding and the local mode hiding vectors. Here, the intensity function acquisition module models the continuous dynamics of the cascade diffusion using the following conditional intensity function:
Wherein Θ represents a parameter of the model, the forwarding time t is defined in a section [ t j,tj+1 ], f (x) =βlg (1+exp (x/β)) is a softplus function, the parameter β is used for constraining the intensity function to be positive, a first current represents an evolution process of a continuous time domain, α controls importance of interpolation nodes in two time stamps, w 1、w2 is learning weight of a global dependence and a local pattern at time t, and λ (t) represents a push arrival rate at time t.
(5) Popularity prediction module
And the popularity prediction module is used for generating popularity prediction results by combining integration of the Hox process and full-connection processing results of the global embedding and local mode hiding vectors. Here, the final popularity is predicted according to the following formula
The first term Λ represents the integral of the hough process, the desired popularity, t o the observation time, t p the prediction time, and the second term the output of the global embedded and local pattern hidden vector splice via the second output network (here fully connected neural network FCNN).
Example 2
This example is a further improvement over example 1.
The information cascade prediction system based on the transform enhanced Hox process further comprises an optimization module, wherein the optimization module is used for optimizing parameters in the system training process.
Assuming that there are N information cascade label samples for training, { C 1,C2,…,CN }, the optimization module optimizes by minimizing the Mean Square Log Error (MSLE) loss of the fluidity and maximizing the log likelihood of all information cascades:
y k represents the true popularity of the kth information cascading label sample, And representing the predictive popularity corresponding to the kth information cascading label sample.
The Adam gradient descent algorithm is then used to optimize the system network parameters.
Example 3
The embodiment provides an information cascade prediction method based on a transform reinforced Hox process, which comprises the following steps:
s1, acquiring user position-wise embedding and time coding, and taking the user position-wise embedding and the time coding as user embedding.
The step uses a user embedding module to respectively acquire a shared position-wise embedding UV and a fixed time code Z, and obtains an end user embedding x=uv+z.
S2, distinguishing all precursor nodes as main path nodes and non-main path nodes for each user node in the user embedding, respectively acquiring long-term dependency attention scores of the main path nodes and the non-main path nodes, further acquiring global dependencies of each node, and obtaining global embedding of all nodes through full connection processing.
The method comprises the steps of firstly distinguishing all precursor nodes from each user node in user embedding by using a path perception attention layer, obtaining attention probability matrixes of the main path nodes and the non-main path nodes, constructing global dependence of all nodes according to the attention probability matrixes of the main path and the non-main path of each node, and obtaining global embedding of all nodes through full connection processing of a first output network.
S3, obtaining a local mode of each node according to a set time slice window, carrying out pooling operation on the local mode of each node to obtain pooled representation of each node, and generating a local mode hiding vector of each node based on a mask attention mechanism.
The method comprises the steps of firstly utilizing a local mode acquisition submodule to intercept a part positioned in front of each node from global dependence according to a set time slice window to obtain a local mode of each node, then carrying out pooling operation on the local mode of each node through a pooling submodule to obtain pooled representation of each node, and dynamically aggregating each node and the local mode pooled representation of the node before the node by using a mask attention layer to generate a hidden vector of the local mode of each node.
S4 is used to parameterize the intensity function of the hough process in terms of global embedding and local mode concealment vectors.
Here, the intensity function acquisition module is used to parameterize the intensity function of the conventional hough process according to the global embedding and local mode concealment vectors, resulting in the intensity function of the Hawkesformer model.
S5, integrating the Hox process and generating a popularity prediction result by combining the full connection processing result of the global embedded and local mode hidden vectors.
Here, integration of the hough process and fully connected processing results of the global embedded and local pattern hidden vectors are combined to generate a popularity prediction result by a popularity prediction module.
Application example
The application uses three different real data sets (Twitter, weibo and APS, the first data set source reference 【Lilian Weng,Filippo Menczer,and Yong-YeolAhn.2013.Virality prediction andcommunity structure in social networks.Scientific Reports3(2013)】, the second data set source reference 【Qi Cao,Huawei Shen,Keting Cen,Wentao Ouyang,and Xueqi Cheng.2017.Deep-Hawkes:Bridging the gap between prediction and understanding of informationcascades.InCIKM.1149–1158】) and the third data set source reference [ https:// journ als. Org/datasets ] to describe the prediction effect of the information cascade prediction system based on the transform enhanced Hox process.
Three data sets were divided into training, validation and test sets at ratios of 70%, 15%, respectively.
Firstly, training and optimizing an information cascade prediction system (Hawkesformer) based on a transducer enhanced Hox process provided in the embodiment 2 by using a training set, and verifying a system obtained by training by using a verification set every time training is completed until a loss function of the systemTend to stabilize.
Then, the trained Hawkesformer was tested on the test set according to the information cascade prediction method based on the transform enhanced houx procedure provided in example 3, and compared with 7 different baseline models (Feature-Deep, deepHawkes, casCN, FOREST, vaCas, tempCas, hawkesformer), and the commonly used measurement method MSLE was used (smaller value, better prediction effect).
TABLE 1 comparison of HawkesFormer versus baseline model Performance on three data sets
Note that bold values indicate better results than other settings. All experimental results were best obtained in 10 trials with statistical significance ρ <0.05.
The description of the 7 baseline methods in table 1 is as follows:
Feature-Deep extracting structural features, temporal features, user features, and inputting them into the bilayer MLP.【Xu,Xovee,et al."CasFlow:Exploring hierarchical structures and propagation uncertainty for cascade prediction."IEEE Transactions on Knowledge and Data Engineering(2021).】
DEEPHAWKES the self-excitation point process of deep learning and Hawkes is combined to bridge the gap between predictive performance and interpretability .【Qi Cao,Huawei Shen,Keting Cen,Wentao Ouyang,and Xueqi Cheng.2017.Deep-Hawkes:Bridging the gap between prediction and understanding of informationcascades.InCIKM.1149–1158.】
CasCN is a hybrid model, by stacking GNNs to obtain node embedding, and using RNNs to simulate the evolution of diffusion .【Chen,Xueqin,et al."Information diffusion prediction via recurrent cascades convolution."2019IEEE 35th international conference on data engineering(ICDE).IEEE,2019.】
Forest, combining reinforcement learning and RNN to address multi-scale cascade prediction problem .【Yang,Cheng,et al."Multi-scale Information Diffusion Prediction with Reinforced Recurrent Networks."IJCAI.2019.】
VaCas integration of hierarchical diffusion model and spatio-temporal structural features of information concatenation while also capturing diffusion uncertainty .【Fan Zhou,Xovee Xu,Kunpeng Zhang,GoceTrajcevski,and Ting Zhong.2020.Variational information diffusion for probabilistic cascades prediction.InINFOCOM.1618–1627.】
TempCas implementing a heuristic approach to embedding cascading graphs using RNNs and learning historical short-term trends using LSTM on specially designed attention-based CNNs .【Tang,Xiangyun,et al."Fully exploiting cascade graphs for real-time forwarding prediction."Proceedings of the AAAI Conference on Artificial Intelligence.Vol.35.No.1.2021.】
As can be seen from the experimental results in Table 1, the information cascade prediction system based on the transform enhanced Hox process provided by the invention can greatly improve the accuracy of popularity prediction compared with other baseline models.
In addition, in order to study the validity of the long-term diffusion dependency relationship between the nodes in the global dependency module capturing dynamic diffusion topology, the application example further obtains an attention probability matrix A, as shown in fig. 5. The following stepwise analysis was performed:
(1) For smaller (13 users) information cascades C 1, the information cascade prediction system provided by the invention focuses more on the main path and early users participating in the diffusion. For each row of matrix A we can see that the user on the main path typically has the most prominent color, which verifies our main/non-main assumption in the information propagation process (similar phenomena are also shown in cascades C 2 and C 3). For node 9 in cascade C 1, hawkesFormer detects that 0→2→9 is the primary path, indicating that node 0 and node 2 have a greater impact on node 9 than other nodes on the primary path have less impact on the forwarding behavior of node 9. For node 8, hawkesFormer detects that 0→4→8 is the main path, indicating that node 0 and node 4 have a greater impact on node 8, and that other nodes on the non-main node have less impact on the forwarding behavior of node 8.
(2) For a medium (58 users) information cascade C 2, a long-term dependency between the root user and the following node is observed. This suggests that long-term dependencies between nodes in a dynamic diffusion topology need to be modeled, because sequence models such as RNNs are difficult to train on long sequences and face gradient vanishing/explosion problems when the sequences are too long. In addition, several short bursts were observed, which again verifies the hypothesis of the present invention to learn local patterns for information concatenation.
(3) Also shown is a viral propagation cascade C 3, which has 106 users with a maximum depth of 8. Most of the rotations (84%) in this cascade are from indirect user to root. We have found that even the "tail" users at the end of the diffusion tree may trigger bursts (e.g., users 19 and 43). Some users act as hubs connecting other influential users, bringing them into the information dissemination process.
Therefore, the information cascade prediction system based on the transform reinforced Hox process provided by the invention not only can improve the accuracy of popularity prediction, but also has stronger interpretability. Experiments on three real data sets demonstrate the superior performance of the present invention over the most advanced baseline model. The improvement in performance of the proposed solution shows that combining the advantages of deep learning (transform) with probability models (hough process) is a new direction to effectively simulate the information cascade diffusion process and improve the prediction accuracy.
In summary, the modeling mode of the information propagation process is crucial to understanding the information propagation mechanism and popularity prediction task, the evolution of the information propagation presents a directed acyclic graph (DIRECTED ACYCLIC GRAPH, DAG), and the topological features and time are coupled in nature, but the existing research adopts a mode of separately modeling the two most important features (topological features and time features) of the cascade, so that cross-domain information is lost, and the expression and prediction capability of the model are reduced. The invention relates to a multi-level attention mechanism and a Hoxwell self-excitation point process so as to realize more effective and reasonable modeling of information cascade and achieve more accurate popularity prediction. In particular, using a two-layer attention structure to parameterize the intensity function of the hough process has been achieved to efficiently acquire representation knowledge from a continuous time domain.
Those of ordinary skill in the art will recognize that the embodiments described herein are for the purpose of aiding the reader in understanding the principles of the present invention and should be understood that the scope of the invention is not limited to such specific statements and embodiments. Those of ordinary skill in the art can make various other specific modifications and combinations from the teachings of the present disclosure without departing from the spirit thereof, and such modifications and combinations remain within the scope of the present disclosure.

Claims (10)

1.一种基于Transformer增强霍克斯过程的信息级联预测系统,其特征在于,包括:1. An information cascade prediction system based on Transformer-enhanced Hawkes processes, characterized in that it comprises: 用户嵌入模块,用于获取用户position-wise嵌入和时间编码,并以用户position-wise嵌入和时间编码作为用户嵌入;The user embedding module is used to obtain the user position-wise embedding and time encoding, and to use the user position-wise embedding and time encoding as the user embedding. 全局依赖模块,用于对用户嵌入中每个用户节点,区分所有前驱节点为主路径节点和非主路径节点,并获取主路径节点和非主路径节点的注意力概率矩阵,进而获取每个节点的全局依赖,再经全连接得到所有节点的全局嵌入;The global dependency module is used to distinguish all predecessor nodes from main path nodes and non-main path nodes for each user node in the user embedding, and obtain the attention probability matrix of main path nodes and non-main path nodes, thereby obtaining the global dependency of each node, and then obtaining the global embedding of all nodes through a fully connected layer. 局部依赖模块,用于依据设定的时间片窗口,获取每个节点的局部模式,然后对每个节点的局部模式进行池化操作得到每个节点的池化表示;最后基于掩码注意力机制,生成每个节点局部模式的隐藏向量;The local dependency module is used to obtain the local pattern of each node according to the set time slice window, and then perform a pooling operation on the local pattern of each node to obtain the pooled representation of each node; finally, based on the mask attention mechanism, the hidden vector of the local pattern of each node is generated. 强度函数获取模块,用于依据全局嵌入和局部模式隐藏向量来参数化霍克斯过程的强度函数;The intensity function acquisition module is used to parameterize the intensity function of the Hawkes process based on the global embedding and local pattern hiding vectors. 流行度预测模块,用于结合霍克斯过程的积分,以及全局嵌入和局部模式隐藏向量的全连接处理结果生成流行度预测结果。The popularity prediction module combines the integral of the Hawkes process with the fully connected processing results of global embedding and local pattern hiding vectors to generate popularity prediction results. 2.根据权利要求1所述的基于Transformer增强霍克斯过程的信息级联预测系统,其特征在于,所述用户嵌入模块包括Position-wise嵌入获取子模块和时间编码获取子模块;2. The information cascade prediction system based on Transformer-enhanced Hawkes process according to claim 1, wherein the user embedding module includes a position-wise embedding acquisition submodule and a time-encoding acquisition submodule; 所述Position-wise嵌入获取子模块,包括共享矩阵U,在一个级联中,对于第j次转发的用户uj,让vj代表它在共享矩阵U上的one-hot索引向量,那么用户uj的position-wise嵌入就由Uvj索引得到,其中V代表所有用户嵌入的one-hot索引集合;所述Position-wise嵌入获取子模块得到的用户Position-wise嵌入为UV;The Position-wise embedding acquisition submodule includes a shared matrix U. In a cascade, for user uj in the j-th forward, let vj represent its one-hot index vector on the shared matrix U. Then the position-wise embedding of user uj is obtained by indexing Uvj , where V represents the set of one-hot indices of all user embeddings. The user position-wise embedding obtained by the Position-wise embedding acquisition submodule is UV. 时间编码获取子模块利用时间编码z来表征序列位置,定义如下:The time-encoding acquisition submodule uses time-encoding z to represent the sequence position, as defined below: 其中是时间编码,D是时间编码的维度,[z(tj)]i代表编码第i个维度的值,是时间编码的集合;in It is a time-encoded sequence, where D is the dimension of the time-encoded sequence, and [z( tj )] i represents the value of the i-th dimension. It is a collection of time codes; 用户嵌入最终记为:X=UV+Z。User embedding is ultimately denoted as: X = UV + Z. 3.根据权利要求1所述的基于Transformer增强霍克斯过程的信息级联预测系统,其特征在于,所述全局依赖模块包括:3. The information cascade prediction system based on Transformer-enhanced Hawkes processes according to claim 1, characterized in that the global dependency module comprises: 路径感知注意力层,用于对用户嵌入中每个用户节点,区分所有前驱节点为主路径节点和非主路径节点,并获取主路径节点和非主路径节点的长期依赖注意力分数,构建主路径和非主路径的注意力概率矩阵;再依据每个节点的主路径和非主路径的注意力概率矩阵,获取其全局依赖;The path-aware attention layer is used to distinguish all predecessor nodes from main path nodes and non-main path nodes for each user node in the user embedding, and to obtain the long-term dependency attention scores of main path nodes and non-main path nodes, and to construct the attention probability matrix of main path and non-main path; then, based on the attention probability matrix of main path and non-main path of each node, its global dependency is obtained. 第一输出网络,用于对每个节点的全局依赖进行全连接处理,得到所有节点的全局嵌入。The first output network is used to perform fully connected processing on the global dependencies of each node, resulting in the global embedding of all nodes. 4.根据权利要求3所述的基于Transformer增强霍克斯过程的信息级联预测系统,其特征在于,首先,基于路径感知注意力层,通过以下公式计算出当前节点分别在主路径、非主路径上的注意力分数:4. The information cascade prediction system based on the Transformer-enhanced Hawkes process according to claim 3, characterized in that, firstly, based on the path-aware attention layer, the attention scores of the current node on the main path and non-main path are calculated using the following formula: Pscore=(WPX)T(WKX)⊙MP P score =(W P X) T (W K X)⊙M P Nscore=(WNX)T(WKX)⊙MN N score =(W N X) T (W K X)⊙M N 其中MP分别用来保留主路径和非主路径上节点的注意力分数的one-hot矩阵,如果 表示主路径节点集合,则元素相应的,如果 表示非主路径节点集合,最终,主路径和非主路径的注意力概率矩阵推导为:Among them, MP , One-hot matrices are used to retain the attention scores of nodes on the main path and non-main path, respectively. If the set represents the main path node set, then the elements are... Correspondingly, if Represents the set of non-main path nodes. Finally, the attention probability matrices for the main path and non-main paths are derived as follows: 其中,D是输入嵌入的维数;掩码矩阵用来防止系统偷窥未来的数据,如果i<j,Mi,j=0,否则Mi,j=-∞;矩阵A中的每个列向量代表uj对其先驱节点的注意力,包括主路径和非主路径上的节点 Where D is the dimension of the input embedding; the mask matrix To prevent the system from spying on future data, if i < j, M<sub>i,j</sub> = 0, otherwise M <sub>i,j </sub> = -∞; each column vector in matrix A represents u<sub>j</sub> relative to its predecessor node. Attention, including the main path Nodes on non-main paths 然后,通过用户全局依赖提取子模块得到每个节点全局依赖的矩阵形式:E=(WVX)A;Then, the matrix form of the global dependencies of each node is obtained through the user global dependency extraction submodule: E = (W V X)A; 通过第一输出网络对E进行全连接,得到所有级联中节点的全局嵌入:By performing a fully connected operation on E through the first output network, the global embeddings of all nodes in the cascade are obtained: h(tj)=H(:,j)h(t j )=H(:,j ) 其中,是神经网络的参数;h(tj)是中的第j列,代表用户uj的长期依赖表示。in, These are the parameters of the neural network; h( tj ) is... The j-th column in the table represents the long-term dependency representation of user uj . 5.根据权利要求1至4任一项所述的基于Transformer增强霍克斯过程的信息级联预测系统,其特征在于,所述局部依赖模块包括:5. The information cascade prediction system based on the Transformer-enhanced Hawkes process according to any one of claims 1 to 4, characterized in that the local dependency module comprises: 局部模式获取子模块,用于依据设定的时间片窗口,从全局依赖中截取位于每个节点之前的部分,得到每个节点的局部模式;The local mode acquisition submodule is used to extract the portion of the global dependencies that precedes each node based on a set time slice window, thus obtaining the local mode of each node; 池化子模块,用于对每个节点的局部模式进行池化操作得到每个节点的池化表示;The pooling submodule is used to perform pooling operations on the local patterns of each node to obtain the pooled representation of each node; 掩码注意力层,用于动态聚合每个节点和与其之前节点的局部模式池化表示,生成每个节点局部模式的隐藏向量。The masked attention layer is used to dynamically aggregate the local pattern pooled representations of each node and its preceding nodes, generating a hidden vector for the local pattern of each node. 6.根据权利要求5所述的基于Transformer增强霍克斯过程的信息级联预测系统,其特征在于,利用掩码注意力层动态聚合当前时间tj之前所有的局部模式嵌入,生成一个隐藏向量来概括所有先前模式的演化过程:6. The information cascade prediction system based on the Transformer-enhanced Hawkes process according to claim 5, characterized in that, a masked attention layer is used to dynamically aggregate all local pattern embeddings before the current time tj to generate a hidden vector that summarizes the evolution process of all previous modes: g(·)是一个线性变换函数,γ={v(t0),v(t1),…,v(tj),…,v(tL)}是演化速率的局部依赖的集合;相似函数f(·,·)指定为:g(·) is a linear transformation function, γ={v(t 0 ),v(t 1 ),…,v(t j ),…,v(t L )} is the set of local dependencies of the evolution rate; the similarity function f(·,·) is specified as: 7.根据权利要求6所述的基于Transformer增强霍克斯过程的信息级联预测系统,其特征在于,强度函数获取模块使用以下条件强度函数对级联扩散的连续动态进行建模:7. The information cascade prediction system based on the Transformer-enhanced Hawkes process according to claim 6, characterized in that the intensity function acquisition module uses the following conditional intensity function to model the continuous dynamics of cascade diffusion: 其中,Θ代表模型的参数,转发时间t定义在区间[tj,tj+1),其中f(x)=βlg(1+exp(x/β))是softplus函数,参数β用于约束强度函数为正;第一项current表示连续时域的演化过程,α控制两个时间戳内的插值节点的重要程度;w1、w2分别是全局依赖和局部模式在t时刻的学习权重,λ(t)表示t时刻的推文到达率。Where Θ represents the model parameters, the forwarding time t is defined in the interval [ tj , tj +1 ), where f(x)=βlg(1+exp(x/β)) is the softplus function, the parameter β is used to constrain the strength function to be positive; the first term current represents the evolution process in the continuous time domain, α controls the importance of the interpolation nodes within the two timestamps; w1 and w2 are the learning weights of global dependency and local pattern at time t, respectively, and λ(t) represents the tweet arrival rate at time t. 8.根据权利要求7所述的基于Transformer增强霍克斯过程的信息级联预测系统,其特征在于,流行度预测模块按照以下公式来预测最终的流行度 8. The information cascade prediction system based on the Transformer-enhanced Hawkes process according to claim 7, characterized in that the popularity prediction module predicts the final popularity according to the following formula. 第一项Λ表示霍克斯过程的积分,表示期望的流行度大小;to表示观测时间;tp表示预测时间;第二项表示将全局嵌入和局部模式隐藏向量拼接结果经第二输出网络后的输出结果。The first term Λ represents the integral of the Hawkes process, indicating the expected popularity; t <sub>o</sub> represents the observation time; t<sub> p </sub> represents the prediction time; the second term represents the output of the concatenation of the global embedding and local pattern hiding vectors through the second output network. 9.根据权利要求8所述的基于Transformer增强霍克斯过程的信息级联预测系统,其特征在于,还包括优化模块,用于系统训练过程中的参数优化;优化模块通过最小化均方对数误差对流行度的损失和最大化所有信息级联的对数似然进行优化:9. The information cascade prediction system based on the Transformer-enhanced Hawkes process according to claim 8, characterized in that it further includes an optimization module for parameter optimization during system training; the optimization module optimizes by minimizing the loss of popularity due to the mean square logarithmic error and maximizing the log-likelihood of all information cascades: yk表示第k条信息级联标签样本的真实流行度,表示第k条信息级联标签样本对应的预测流行度,l(Ck)表示一条级联Ck的联合似然,N表示信息级联标签样本数量。y <sub>k</sub> represents the true popularity of the k-th message cascaded label sample. Let l(C<sub>k</sub> ) represent the predicted popularity of the k-th cascaded label sample, l(C<sub>k</sub>) represent the joint likelihood of a cascaded C<sub> k </sub>, and N represent the number of cascaded label samples. 10.一种基于Transformer增强霍克斯过程的信息级联预测方法,其包括以下步骤:10. An information cascade prediction method based on Transformer-enhanced Hawkes processes, comprising the following steps: S1获取用户position-wise嵌入和时间编码,并以用户position-wise嵌入和时间编码作为用户嵌入;S1 obtains the user position-wise embedding and time encoding, and uses the user position-wise embedding and time encoding as the user embedding; S2对用户嵌入中每个用户节点,区分所有前驱节点为主路径节点和非主路径节点,并分别获取主路径节点和非主路径节点的长期依赖注意力分数,进而获取每个节点的全局依赖,再经全连接处理得到所有节点的全局嵌入;S2 distinguishes all predecessor nodes from main path nodes and non-main path nodes for each user node in the user embedding, and obtains the long-term dependency attention scores of main path nodes and non-main path nodes respectively, thereby obtaining the global dependency of each node, and then obtains the global embedding of all nodes through fully connected processing. S3依据设定的时间片窗口,获取每个节点的局部模式,然后对每个节点的局部模式进行池化操作得到每个节点的池化表示;然后基于掩码注意力机制,生成每个节点的局部模式隐藏向量;S3 obtains the local pattern of each node according to the set time slice window, and then performs a pooling operation on the local pattern of each node to obtain the pooled representation of each node; then, based on the mask attention mechanism, it generates the local pattern hidden vector of each node. S4用于依据全局嵌入和局部模式隐藏向量来参数化霍克斯过程的强度函数;S4 is used to parameterize the intensity function of the Hawkes process based on the global embedding and local pattern hiding vectors; S5用于结合霍克斯过程的积分,以及全局嵌入和局部模式隐藏向量的全连接处理结果生成流行度预测结果。S5 is used to combine the integral of the Hawkes process with the fully connected processing results of global embedding and local pattern hiding vectors to generate popularity prediction results.
CN202210983826.9A 2022-08-17 2022-08-17 Information Cascade Prediction System and Method Based on Transformer Enhanced Hawkes Process Active CN115409155B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210983826.9A CN115409155B (en) 2022-08-17 2022-08-17 Information Cascade Prediction System and Method Based on Transformer Enhanced Hawkes Process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210983826.9A CN115409155B (en) 2022-08-17 2022-08-17 Information Cascade Prediction System and Method Based on Transformer Enhanced Hawkes Process

Publications (2)

Publication Number Publication Date
CN115409155A CN115409155A (en) 2022-11-29
CN115409155B true CN115409155B (en) 2026-01-09

Family

ID=84159909

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210983826.9A Active CN115409155B (en) 2022-08-17 2022-08-17 Information Cascade Prediction System and Method Based on Transformer Enhanced Hawkes Process

Country Status (1)

Country Link
CN (1) CN115409155B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115858899B (en) * 2022-12-14 2025-07-11 中国人民解放军国防科技大学 A network event tag popularity prediction method based on multi-tag influence
CN115759482A (en) * 2022-12-29 2023-03-07 人民网股份有限公司 Social media content propagation prediction method and device
CN116596025B (en) * 2023-04-17 2024-09-10 苏州大学 A method and system for predicting information popularity based on cascade relationship diversity
CN116842398B (en) * 2023-06-27 2024-06-28 哈尔滨工业大学 A topic-aware information forwarding prediction method and system based on shielded self-attention network
CN117268763B (en) * 2023-09-21 2024-09-27 安徽大学 A rolling bearing life prediction method and system
CN119692524B (en) * 2023-11-14 2025-10-14 电子科技大学 A multimodal social media popularity prediction method based on time-aware hypergraph learning
CN119337704B (en) * 2024-09-23 2025-10-31 中国船舶集团有限公司第七一九研究所 Online viewpoint evolution simulation method and device based on offline event excitation model
CN119005321B (en) * 2024-10-23 2025-04-25 中国计量大学 Electromechanical equipment predictive maintenance method integrating time sequence knowledge graphs
CN119558751B (en) * 2024-12-02 2025-12-09 深圳大学 Supply chain distribution path planning method based on reinforcement learning
CN119558490B (en) * 2025-01-27 2025-07-22 北京邮电大学 Diffusion model-based early information propagation prediction method and system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114037549A (en) * 2021-10-09 2022-02-11 北京交通大学 Online generated content popularity prediction method based on neighbor perception representation learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9875443B2 (en) * 2015-06-18 2018-01-23 TCL Research America Inc. Unified attractiveness prediction framework based on content impact factor

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114037549A (en) * 2021-10-09 2022-02-11 北京交通大学 Online generated content popularity prediction method based on neighbor perception representation learning

Also Published As

Publication number Publication date
CN115409155A (en) 2022-11-29

Similar Documents

Publication Publication Date Title
CN115409155B (en) Information Cascade Prediction System and Method Based on Transformer Enhanced Hawkes Process
CN114116995B (en) Session recommendation method, system and medium based on enhanced graph neural network
CN113536144B (en) Social network information propagation scale prediction method and device
CN113868474A (en) An information cascade prediction method based on self-attention mechanism and dynamic graph
Xie et al. Unsupervised user identity linkage via factoid embedding
CN112101577A (en) XGboost-based cross-sample federal learning and testing method, system, device and medium
CN114117229A (en) An Item Recommendation Method Based on Directed and Undirected Structural Information of Graph Neural Networks
CN113946708A (en) Topic propagation prediction method based on image restoration technology and rumor splitting information
CN113591465B (en) Multi-dimensional IoC entity recognition method and device for network threat intelligence based on correlation enhancement
Jing et al. Casft: Future trend modeling for information popularity prediction with dynamic cues-driven diffusion models
CN116523118B (en) Multi-source information propagation prediction method and system based on heterogeneous graph neural network
Wu et al. Federated class-incremental learning via weighted aggregation and distillation
Zeng et al. Joint effects of context and user history for predicting online conversation re-entries
CN119962727B (en) Information propagation prediction method and system based on hypergraph neural network combined with text
Wen et al. To Predict or to Reject: Causal Effect Estimation with Uncertainty on Networked Data
Xie et al. Independent asymmetric embedding for information diffusion prediction on social networks
Wang et al. Combining macro and micro: feature-driven dynamic graph learning for social media popularity prediction
CN119293205A (en) A multi-task based aspect sentiment analysis method and system
Zhou et al. Modeling Personalized Retweeting Behaviors for Multi-Stage Cascade Popularity Prediction.
Zhao et al. A continuous-time diffusion model for inferring multi-layer diffusion networks: Y. Zhao et al.
Liu et al. Prediction model for non-topological event propagation in social networks
CN115600609A (en) A conversation recommendation method, storage medium and device based on item representation enhancement
Jing et al. On Your Mark, Get Set, Predict! Modeling Continuous-Time Dynamics of Cascades for Information Popularity Prediction
Hu et al. Federated learning for character prediction for text generation
CN118410347B (en) A social network account alignment method based on hypergraph embedding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant