KR20120004087A

KR20120004087A - Lockless memory controller for multiprocessor and multiprocessor system using the memory controller

Info

Publication number: KR20120004087A
Application number: KR1020100064750A
Authority: KR
Inventors: 오상윤; 백동명; 이정희; 이승우; 박영호; 이범철
Original assignee: 한국전자통신연구원
Priority date: 2010-07-06
Filing date: 2010-07-06
Publication date: 2012-01-12
Anticipated expiration: 2030-07-06
Also published as: KR101667426B1

Abstract

PURPOSE: A lock-free memory controller and a multiprocessor system using the lock-free memory controller are provided to process shared data in parallel without a need that each processor uses a locking function, thereby linearly increasing processing performance due to an increase in the number of processors. CONSTITUTION: A memory(610) stores shared data. A processor(620) manages a data block of the memory as one data area. A memory controller(630) provides specific data to the processor by a control signal of the processor. If the control signal is a write command, the memory controller reads data of a write address from the memory, stores the data in a temporary buffer, stores write data in a corresponding bit of the temporary buffer, and stores the data of the temporary buffer in the memory.

Description

LOCK-FREE MEMORY CONTROLLER AND MULTIPROCESSOR SYSTEM USING THE LOCK-FREE MEMORY CONTROLLER}

본 발명의 실시예들은 다중 프로세서 시스템에 관한 것이다.Embodiments of the present invention relate to a multiprocessor system.

본 발명은 지식경제부 및 정보통신연구진흥원의 IT산업원천기술개발사업의 일환으로 수행한 연구로부터 도출된 것이다[과제관리번호: KI002197, 과제명: Scalable 마이크로 플로우 처리기술개발].The present invention is derived from a study conducted as part of the IT industry source technology development project of the Ministry of Knowledge Economy and the Ministry of Information and Communication Research and Development (Task Management Number: KI002197, Task name: Scalable microflow processing technology development).

IT 기술의 발전으로 인하여 프로세서의 클럭 속도 증가 및 고집적 트랜지스터 기술에 의하여 단일 프로세서의 성능과 용량이 계속적으로 증가하여 왔다. Due to the development of IT technology, the performance and capacity of a single processor have been continuously increased by increasing the clock speed of the processor and the high density transistor technology.

그러나, 프로세서의 클럭 속도 증가 및 고집적 트랜지스터 기술에 의한 성능향상이 포화되어 단일 프로세서에 의한 처리성능의 향상보다 다수의 프로세서를 사용하여 병렬 처리를 수행함으로써 처리 성능을 향상시키는 병렬 처리 기술이 보편화되고 있는 실정이다.However, due to the increase in the clock speed of the processor and the performance improvement due to the integrated transistor technology, the parallel processing technology that improves the processing performance by performing parallel processing using multiple processors is becoming more common than the improvement of the processing performance by a single processor. It is true.

이러한, 프로세서 병렬 처리 기술의 경우 처리하여야 하는 데이터를 각 프로세서가 병렬적으로 처리가 가능하도록 병렬적인 처리 단위로 나누는 것이 필요한 바, 각 프로세서가 병렬적으로 동작함으로써 공유되는 자원을 각 프로세서가 동시에 사용하는 경우가 발생하기도 한다.In the case of processor parallel processing technology, it is necessary to divide data to be processed into parallel processing units so that each processor can process in parallel. Each processor operates in parallel so that each processor simultaneously uses shared resources. Sometimes it happens.

일반적으로 프로세서는 공유되는 자원을 동시에 사용하는 경우, 공유 자원을 하나의 프로세서가 잠금(Lock)을 수행하여 사용한 후, 잠금을 해제하고 해제된 자원에 대하여 나머지 프로세서 중 어느 하나의 프로세서가 잠금을 수행하고 다시 자원을 사용하는 플로우로 진행하였다.In general, when a processor uses a shared resource at the same time, the shared resource is used by one processor to lock the lock, and then the lock is released and one of the other processors locks the released resource. Then we went back to the flow using resources.

이러한 경우, 공유되는 자원에 대하여 다수의 프로세서가 동시에 사용을 못 하고 순차적으로 사용함에 따라, 병렬 처리의 성능이 저하되고 하나의 프로세서가 동작하는 성능과 유사하게 되는 문제점이 있다.In this case, as a plurality of processors cannot be used at the same time for a shared resource and sequentially used, there is a problem in that the performance of parallel processing is degraded and the performance of one processor is similar.

본 발명의 일실시예는 다수의 프로세서에 의하여 공유되는 메모리 제어기를 다수의 포트로 구성하여 프로세서의 잠금 기능이 없이 공유되는 데이터를 병렬 처리하는 성능을 향상시키는데 목적이 있다.One embodiment of the present invention is to improve the performance of parallel processing of the shared data without the processor lock function by configuring a memory controller shared by a plurality of processors to a plurality of ports.

또한, 본 발명의 일실시예는 각 프로세서가 공유되는 데이터를 잠금 기능을 사용할 필요가 없이, 병렬적으로 데이터를 처리할 수 있어 프로세서 수의 증가에 따른 처리 성능을 선형적으로 증가시킬 수 있는 시스템을 제공하는데 그 목적이 있다.In addition, an embodiment of the present invention is a system that can process the data in parallel with the increase in the number of processors to process the data in parallel, without having to use the lock function for each processor shared data The purpose is to provide.

본 발명의 일실시예에 따른 다중 프로세서 시스템은, 공유 데이터를 저장하는 메모리; 상기 메모리의 하나 이상의 데이터 블록을 하나의 데이터 영역으로 관리하는 프로세서; 및 상기 프로세서로부터 수신된 제어 신호에 따라 상기 프로세서에 상기 공유 데이터 중 특정 데이터를 제공하는 메모리 제어기를 포함하고, 상기 프로세서는 다른 프로세서가 참조하는 데이터를 쓰는 경우 상기 데이터 블록의 비트 중 다른 프로세서와 배타적인 하나 이상의 비트 영역에 상기 데이터를 쓰고, 상기 데이터를 읽을 경우 상기 하나 이상의 데이터 블록 전체의 데이터를 읽는 것을 특징으로 하며, 상기 메모리 제어기는 상기 제어 신호가 쓰기 명령인 경우 쓰기 주소 데이터를 상기 메모리로부터 읽은 후 임시 버퍼에 저장하고, 쓰기 데이터를 상기 임시 버퍼의 해당 비트에 저장한 후 상기 임시 버퍼를 상기 메모리에 저장하며, 상기 제어 신호가 읽기 명령인 경우 읽기 주소 데이터를 상기 메모리로부터 읽어 상기 프로세서에 제공하는 것을 특징으로 한다.Multiprocessor system according to an embodiment of the present invention, the memory for storing shared data; A processor for managing one or more data blocks of the memory into one data area; And a memory controller configured to provide specific data of the shared data to the processor according to a control signal received from the processor, wherein the processor is exclusive of another processor of bits of the data block when writing data referenced by another processor. Write the data in at least one bit area, and read the data of the entire one or more data blocks when the data is read, wherein the memory controller reads write address data from the memory if the control signal is a write command. Read and store in the temporary buffer, write data to the corresponding bit of the temporary buffer, and then store the temporary buffer in the memory, and read the read address data from the memory to the processor if the control signal is a read command. To provide .

본 발명의 일실시예에 따르면 각 프로세서가 공유되는 데이터를 잠금 기능을 사용할 필요가 없이, 병렬적으로 데이터를 처리할 수 있어 프로세서 수의 증가에 따른 처리 성능을 선형적으로 증가시킬 수 있다.According to an embodiment of the present invention, data can be processed in parallel without using a lock function for each processor to share data, thereby linearly increasing processing performance according to an increase in the number of processors.

도 1은 본 발명의 일실시예에 따른 다중 프로세서 시스템에 기초되는 루프 코드를 도시한 도면이다.
도 2는 본 발명의 일실시예에 따른 다중 프로세서 시스템에 기초되는 각 프로세서에서의 코드를 도시한 도면이다.
도 3은 본 발명의 일실시예에 따른 다중 프로세서 시스템에 기초되는 초기 프로세서에서의 코드 수행 과정을 도시한 도면이다.
도 4는 일반적인 다중 프로세서 시스템에 기초되는 잠금 기능 사용 시 데이터 구조를 도시한 도면이다.
도 5는 일반적인 다중 프로세서 시스템에 기초되는 잠금 기능 사용 시 프로세서의 동작을 도시한 도면이다.
도 6은 본 발명의 일실시예에 따른 다중 프로세서 시스템의 구성을 도시한 블록도이다.
도 7은 본 발명의 일실시예에 따른 다중 프로세서 시스템을 통하여 제공되는 데이터 구조를 도시한 도면이다.
도 8은 본 발명의 일실시예에 따른 다중 프로세서 시스템에 의한 프로세서의 코드를 수행하는 과정을 도시한 도면이다.
도 9는 본 발명의 일실시예에 따른 다중 프로세서 시스템에 의한 프로세서 동작을 도시한 도면이다.
도 10은 본 발명의 일실시예에 따른 패킷 검사기의 패턴 검사 데이터 구조를 도시한 도면이다.1 illustrates a loop code based on a multiprocessor system according to an embodiment of the present invention.
2 is a diagram illustrating code in each processor based on a multiprocessor system according to an embodiment of the present invention.
3 is a diagram illustrating a code execution process in an initial processor based on a multiprocessor system according to an embodiment of the present invention.
4 is a diagram illustrating a data structure when using a lock function based on a general multiprocessor system.
5 is a diagram illustrating an operation of a processor when using a locking function based on a general multiprocessor system.
6 is a block diagram illustrating a configuration of a multiprocessor system according to an embodiment of the present invention.
7 illustrates a data structure provided through a multiprocessor system according to an embodiment of the present invention.
8 is a diagram illustrating a process of executing a code of a processor by a multiprocessor system according to an embodiment of the present invention.
9 illustrates a processor operation by a multiprocessor system according to an embodiment of the present invention.
10 is a diagram illustrating a pattern check data structure of a packet checker according to an embodiment of the present invention.

이하 첨부 도면들 및 첨부 도면들에 기재된 내용들을 참조하여 본 발명의 실시예를 상세하게 설명하지만, 본 발명이 실시예에 의해 제한되거나 한정되는 것은 아니다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings and accompanying drawings, but the present invention is not limited to or limited by the embodiments.

한편, 본 발명을 설명함에 있어서, 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는, 그 상세한 설명을 생략할 것이다. 그리고, 본 명세서에서 사용되는 용어(terminology)들은 본 발명의 실시예를 적절히 표현하기 위해 사용된 용어들로서, 이는 사용자, 운용자의 의도 또는 본 발명이 속하는 분야의 관례 등에 따라 달라질 수 있다. 따라서, 본 용어들에 대한 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.On the other hand, in describing the present invention, when it is determined that the detailed description of the related known function or configuration may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted. The terminology used herein is a term used for appropriately expressing an embodiment of the present invention, which may vary depending on the user, the intent of the operator, or the practice of the field to which the present invention belongs. Therefore, the definitions of the terms should be made based on the contents throughout the specification.

도 1은 본 발명의 일실시예에 따른 다중 프로세서 시스템에 기초되는 루프 코드를 도시한 도면이다.1 illustrates a loop code based on a multiprocessor system according to an embodiment of the present invention.

본 발명의 일실시에 따른 다중 프로세서 시스템은 도 1에 도시된 바와 같이, 순차적으로 실행되는 병렬 프로그램으로 for 문 또는 while 문 등으로 구성된 루프(Loop) 코드를 기초로 한다.As shown in FIG. 1, a multiprocessor system according to an exemplary embodiment of the present invention is a parallel program that is executed sequentially based on a loop code composed of a for statement or a while statement, and the like.

예를 들어, 루프 코드는 도 1에 도시된 바와 같이 변수 i를 0부터 99까지 변화시키면서 body code(s)를 순차적으로 수행하며, 상기와 같은 경우 body code(s)가 100번을 순차적으로 수행하게 되는데 2개의 프로세서가 도 1의 코드를 나누어서 병렬로 수행하면 처리 속도를 이론적으로는 2배로 향상시키는 것이 가능하고 4개의 프로세서를 사용시 4배 등으로 프로세서 수에 비례하여 처리 속도를 선형적으로 향상시키는 것이 가능하다.For example, the loop code sequentially performs the body code (s) while changing the variable i from 0 to 99 as shown in FIG. 1, and in this case, the body code (s) sequentially performs 100 times. When two processors divide the code of FIG. 1 and execute in parallel, it is possible to theoretically increase the processing speed by 2 times, and increase the processing speed linearly in proportion to the number of processors by 4 times when using 4 processors. It is possible to let.

그러나, 복수의 프로세서가 루프 코드를 병렬적으로 실행할 경우 변수 i를 공유하므로 데이터 일치성(Consistency)이 잘 못 되어 body code(s)를 100회 수행하지 않고 두 개의 프로세서인 경우 body code(s)를 200회 수행하게 된다.However, if multiple processors execute the loop code in parallel, the variable i is shared, so the data consistency is not good and the body code (s) is not executed 100 times. 200 times.

도 1의 코드를 각 프로세서에서의 코드로 해석한 경우를 더욱 상세히 설명하면 다음과 같다.A case where the code of FIG. 1 is interpreted as a code in each processor will be described in more detail as follows.

도 2는 본 발명의 일실시예에 따른 다중 프로세서 시스템에 기초되는 각 프로세서에서의 코드를 도시한 도면이고, 도 3은 본 발명의 일실시예에 따른 다중 프로세서 시스템에 기초되는 초기 프로세서에서의 코드 수행 과정을 도시한 도면이다.2 illustrates code in each processor based on a multiprocessor system according to an embodiment of the present invention, and FIG. 3 illustrates code in an initial processor based on a multiprocessor system according to an embodiment of the present invention. A diagram illustrating the execution process.

예를 들어, 각 프로세서는 공유 메모리에서 변수 i를 읽어서(read) 100 이상이면 수행을 중단하고 100미만이면(check) 코드(body code(s))를 수행(execute)하며, 변수 i를 1만큼 증가(modify)시키고 증가된 변수 i를 공유 메모리에 저장한다.For example, each processor reads variable i from shared memory and stops execution if it is greater than or equal to 100, executes code if body code (s) is less than 100, and sets variable i by 1 Modifies and stores the variable i in shared memory.

두 개의 프로세서가 도 2에 도시된 바와 같이 코드를 병렬적으로 수행하는 경우, 공유 메모리는 하나이므로 두 프로세서가 동시에 변수 i를 읽기 요청을 하여도 메모리 제어기는 순차적으로 변수 i를 읽어서 공급하므로 도 3과 같이 프로세서 1에서 변수 i 값으로 0을 읽어오고 그 후 프로세서 2도 동일한 메모리 장소에서 변수 i 값으로 0을 읽어오게 된다.When two processors execute code in parallel as shown in FIG. 2, since the shared memory is one, even if two processors simultaneously read the variable i, the memory controller reads and supplies the variable i sequentially, FIG. 3. Processor 0 reads 0 as variable i and then processor 2 reads 0 as variable i in the same memory location.

예를 들어, 도 3에 도시된 프로세서 1은 t = 2에서 코드를 수행한 후 변수 i를 증가시켜 1을 메모리에 저장하고, 프로세서 2는 t = 3에서 코드를 수행한 후 변수 i를 증가시켜 1을 메모리에 저장한다. For example, processor 1 shown in FIG. 3 executes the code at t = 2 and then increases variable i to store 1 in memory, and processor 2 executes the code at t = 3 and then increases variable i. Store 1 in memory.

즉, 프로세서 2는 t = 4에서 1이 저장되어 있지만 원하는 2를 저장하지 않고 변수 i를 점검(check)하기 위하여 읽어온 0을 하나 증가한 1을 메모리에 저장하게 된다. 결과적으로 각 프로세서가 코드를 각각 100회를 수행하므로 전체적으로는 200회를 수행한 결과가 된다.In other words, processor 2 stores 1 in t = 4, but stores 1 in memory, incrementing 1 by 0 to check variable i without storing 2 as desired. As a result, each processor executes the code 100 times each, resulting in a total of 200 times.

도 4는 일반적인 다중 프로세서 시스템에 기초되는 잠금 기능 사용 시 데이터 구조를 도시한 도면이다.4 is a diagram illustrating a data structure when using a lock function based on a general multiprocessor system.

상기와 같이 수행 결과가 증가하는 점을 보완하기 위하여, 프로세서는 도 4와 같이 잠금 변수 lock를 메모리에 추가하여 잠금 변수 lock이 0인 경우 1로 변경하여 잠금을 수행하고 변수 i를 읽어서 수행한 후 증가된 변수 i를 메모리에 저장하고 잠금 변수 lock을 0으로 하여 다른 프로세서에 변수 i를 사용할 수 있는 기회를 제공하는 것이 바람직하다. To compensate for the increase in the execution result as described above, the processor adds the lock variable lock to the memory as shown in FIG. 4 and changes the lock variable lock to 1 when the lock variable lock is 0 to perform the lock and read the variable i. It is desirable to store the increased variable i in memory and set the lock variable lock to 0 to give the other processor an opportunity to use the variable i.

동일한 방법으로 다른 프로세서도 잠금 변수 lock을 사용하여 변수 i를 사용하여 루프를 수행한 후, 증가된 변수 i를 메모리에 저장하고 잠금 변수 lock을 0으로 함으로써, 다른 프로세서에 변수 i를 사용할 수 있는 기회를 제공할 수 있다.In the same way, another processor uses the lock variable lock to loop through variable i, then stores the incremented variable i in memory and sets the lock variable lock to 0, thereby making the variable i available to other processors. Can be provided.

도 5는 일반적인 다중 프로세서 시스템에 기초되는 잠금 기능 사용 시 프로세서의 동작을 도시한 도면이다.5 is a diagram illustrating an operation of a processor when using a locking function based on a general multiprocessor system.

상기와 같은 경우, 도 5에 도시된 바와 같이 두 개의 프로세서가 교대로 동작을 수행하여 전체적으로 100회를 수행한 후 중단하지만, 두 프로세서가 순차적으로 수행하므로 하나의 프로세서가 연속적으로 수행하는 경우와 유사한 성능이 제공된다.In this case, as shown in FIG. 5, two processors alternately perform an operation to perform 100 times in total, and then stop the process. However, since the two processors perform sequentially, a similar process is similar to a case in which one processor performs continuously. Performance is provided.

본 발명의 일실시예에 따른 다중 프로세서 시스템은 전술한 프로세서 구조 및 기술 동작 방법을 더욱 보완하는 구성 및 방법을 제공하고자 한다.The multi-processor system according to an embodiment of the present invention is to provide a configuration and method that further complements the above-described processor structure and technology operating method.

도 6은 본 발명의 일실시예에 따른 다중 프로세서 시스템의 구성을 도시한 블록도이며, 이때, n은 2 이상의 자연수일 수 있다.6 is a block diagram illustrating a configuration of a multiprocessor system according to an embodiment of the present invention, where n may be two or more natural numbers.

본 발명의 일실시예에 따른 다중 프로세서 시스템은 도 6에 도시된 바와 같이, 공유 데이터를 저장하는 메모리(610), 메모리(610)의 하나 이상의 데이터 블록을 하나의 데이터 영역으로 관리하는 다중 프로세서(620) 및 다중 프로세서(620)로부터 수신된 제어 신호에 따라 다중 프로세서(620)에 상기 공유 데이터 중 특정 데이터를 제공하는 메모리 제어기(630)로 구성된다.As illustrated in FIG. 6, a multiprocessor system according to an embodiment of the present invention may include a memory 610 for storing shared data and a multiprocessor for managing one or more data blocks of the memory 610 as one data area. 620 and a memory controller 630 which provides specific data among the shared data to the multiprocessor 620 according to a control signal received from the multiprocessor 620.

이때, 본 발명의 일측에 따른 다중 프로세서(620) 중의 하나는 다른 프로세서가 참조하는 데이터를 쓰는 경우 상기 데이터 블록의 비트 중 다른 프로세서와 배타적인 하나 이상의 비트 영역에 상기 데이터를 쓰고, 상기 데이터를 읽을 경우 상기 데이터 블록 전체의 데이터를 읽는다.In this case, when one of the multiprocessors 620 according to an embodiment of the present invention writes data referred to by another processor, the data is written to one or more bit areas exclusive of another processor among the bits of the data block, and the data is read. If the data of the entire data block is read.

또한, 본 발명의 일측에 따른 메모리 제어기(630)는 상기 제어 신호가 쓰기 명령인 경우 쓰기 주소 데이터를 메모리(610)로부터 읽은 후 임시 버퍼에 저장하고, 쓰기 데이터를 상기 임시 버퍼의 해당 비트에 저장한 후 상기 임시 버퍼를 메모리(610)에 저장하며, 상기 제어 신호가 읽기 명령인 경우 읽기 주소 데이터를 메모리(610)로부터 읽어 읽기를 요청한 프로세서에 제공한다.In addition, when the control signal is a write command, the memory controller 630 reads write address data from the memory 610 and stores the write address data in a temporary buffer, and stores the write data in a corresponding bit of the temporary buffer. After that, the temporary buffer is stored in the memory 610, and when the control signal is a read command, the temporary buffer is read from the memory 610 and provided to the processor that has requested a read.

이때, 본 발명의 일측에 따른 메모리 제어기(630)는 상기 데이터 구조를 하나의 주소로 관리할 수 있으며, 본 발명의 일측에 따른 메모리(610)는 버스트 모드가 가능한 메모리 모듈을 사용하는 것이 바람직하다.In this case, the memory controller 630 according to an embodiment of the present invention may manage the data structure as one address, and the memory 610 according to an embodiment of the present invention preferably uses a memory module capable of burst mode. .

본 발명의 일측에 따른 메모리 제어기(630)는 다중 프로세서(620)에 인터페이스를 제공하기 위하여 멀티 포트를 갖는 구조로 명령어 큐(631) 및 명령어 실행기(632)로 구성된다.The memory controller 630 according to an embodiment of the present invention includes a command queue 631 and an instruction executor 632 in a structure having multiple ports to provide an interface to the multiprocessor 620.

이때, 본 발명의 일측에 따른 명령어 큐(631)는 프로세서로부터의 읽기, 쓰기 명령을 수신하여 저장하고 읽기 명령에 대한 읽기 데이터를 상기 프로세서에 제공한다.In this case, the instruction queue 631 according to an embodiment of the present invention receives and stores a read and write command from a processor and provides read data for a read command to the processor.

또한, 본 발명의 일측에 따른 명령어 실행기(632)는 명령어 큐(631)로부터 명령을 해독하여 해당 데이터를 제공한다.In addition, the instruction executor 632 according to one aspect of the present invention decodes an instruction from the instruction queue 631 and provides corresponding data.

예를 들어, 본 발명의 일측에 따른 명령어 실행기(632)는 쓰기 명령인 경우 쓰기 주소의 데이터를 메모리로부터 읽은 후, 임시버퍼에 저장하고 쓰기 데이터를 상기 임시버퍼의 해당 비트에 저장 후 상기 임시버퍼의 데이터를 메모리(610)에 저장한다.For example, the command executor 632 according to one embodiment of the present invention reads data of a write address from a memory after storing a write command in a temporary buffer and stores write data in a corresponding bit of the temporary buffer. Data is stored in the memory 610.

또 다른 예로, 본 발명의 일측에 따른 명령어 실행기(632)는 읽기 명령인 경우 읽기 주소의 데이터를 메모리(610)로부터 읽어서 상기 명령어 큐(631)에 제공한다. As another example, the command executor 632 according to an embodiment of the present invention reads data of a read address from the memory 610 and provides it to the command queue 631 in the case of a read command.

이때, 본 발명의 일측에 따르면 상기 쓰기 데이터는 상기 다중 프로세서(620)의 각 프로세서에 배타적으로 임시 버퍼에 저장될 수 있다.In this case, according to one side of the present invention, the write data may be stored in a temporary buffer exclusively for each processor of the multiprocessor 620.

도 7은 본 발명의 일실시예에 따른 다중 프로세서 시스템을 통하여 제공되는 데이터 구조를 도시한 도면이다.7 illustrates a data structure provided through a multiprocessor system according to an embodiment of the present invention.

본 발명의 일측에 따른 다중 프로세서 시스템은 도 7과 같이 변수 i와 잠금 변수 lock 대신에 프로세서 1의 변수 i (i1), 프로세서 2의 변수 i (i2)를 사용하고 변수 i1, i2를 하나의 데이터 구조로 연결하여 동일한 주소로 액세스가 가능하게 한다. The multiprocessor system according to an embodiment of the present invention uses variable i (i1) of processor 1 and variable i (i2) of processor 2 instead of variable i and lock variable lock as shown in FIG. Connect to the structure to allow access to the same address.

예를 들어, 본 발명의 일측에 따른 다중 프로세서 시스템은 프로세서 1, 2가 변수 읽기를 요청하는 경우, 변수 i1, i2를 같이 제공하여 변수 i1, i2의 값을 더함으로써 100미만인 경우 코드를 수행하고 합이 100이상이 되면 코드 수행을 중단한다. For example, the multiprocessor system according to an embodiment of the present invention, when processors 1 and 2 request to read a variable, provides the variables i1 and i2 together to add a value of the variables i1 and i2 to perform a code when less than 100. If the sum exceeds 100, the code stops running.

또한, 본 발명의 일측에 따른 다중 프로세서 시스템은 프로세서 1, 2가 변수 쓰기를 요청하면 프로세서 1이 요청한 데이터는 변수 i1에만 쓰기를 수행하고 프로세서 2가 요청한 데이터는 변수 i2에만 쓰기를 수행하면 데이터를 일치성을 유지할 수가 있다.In addition, the multi-processor system according to an embodiment of the present invention if the processor 1, 2 requests to write a variable, the data requested by the processor 1 writes only the variable i1 and the data requested by the processor 2 writes only the variable i2, the data You can maintain consistency.

도 8은 본 발명의 일실시예에 따른 다중 프로세서 시스템에 의한 프로세서의 코드를 수행하는 과정을 도시한 도면이다.8 is a diagram illustrating a process of executing a code of a processor by a multiprocessor system according to an embodiment of the present invention.

예를 들어, 도 8에 도시된 바와 같이 본 발명의 일측에 따르면 t=5에서 프로세서 1에서의 변수 i2는 0 또는 1이 될 수 있으며, 이는 메모리 제어기(630)의 서비스 정책 및 프로세서의 명령어 처리 속도에 따라서 달라질 수 있다. For example, as shown in FIG. 8, according to one side of the present invention, at t = 5, the variable i2 in processor 1 may be 0 or 1, which is a service policy of the memory controller 630 and a command processing of the processor. It may vary depending on the speed.

상기와 같은 차이는 프로세서의 수가 많아질수록 현저하게 나타날 가능성이 있으며, 수행 횟수가 100을 초과하는 경우 빈번하게 발생할 수 있다. Such a difference may be remarkable as the number of processors increases, and may occur frequently when the number of executions exceeds 100.

예를 들어, 프로세서 1에서 점검한 결과 99회째 수행인 경우 프로세서 1이 100회를 수행하려고 하는 것과 병행하여 프로세서 2가 수행될 수 있으므로 총 수행횟수는 101회가 될 수 있다.For example, as a result of checking in the processor 1, when the 99th execution is performed, since the processor 2 may be performed in parallel with the processor 1 trying to perform 100 times, the total number of executions may be 101 times.

본 발명의 일측에 따른 다중 프로세서 시스템은 상기와 같은 점을 보완하기 위하여 하나의 프로세서는 100회까지 수행하고 프로세서 2는 임계값(Threshold)을 설정하여 변수 i1과 변수 i2의 합이 임계값보다 적을 경우에는 병렬적으로 수행하고 임계값보다 클 경우에는 수행을 하지 않고 임계값을 검사만 하여 프로세서 1이 100회를 수행한 후 코드 수행을 중지하면 전체적으로 100회를 수행하도록 설계할 수 있다.In order to compensate for the above, the multiprocessor system according to an embodiment of the present invention performs one processor up to 100 times and sets the threshold value of the processor 2 so that the sum of the variable i1 and the variable i2 is less than the threshold value. In this case, it can be designed to execute 100 times in total if the processor 1 executes 100 times and stops executing the code after performing the process by checking the threshold without performing it in parallel when it is larger than the threshold.

도 9는 본 발명의 일실시예에 따른 다중 프로세서 시스템에 의한 프로세서 동작을 도시한 도면이다.9 illustrates a processor operation by a multiprocessor system according to an embodiment of the present invention.

도 9를 참조하면, 상기와 같은 구성으로 본 발명의 일측에 따른 다중 프로세서 시스템을 구성한 결과, 도 5에 도시된 프로세서 동작에 비하여 수행 시간이 절반으로 줄어든 것을 알 수 있다.Referring to FIG. 9, as a result of configuring the multiprocessor system according to an embodiment of the present invention with the above configuration, it can be seen that the execution time is reduced by half compared to the processor operation shown in FIG. 5.

본 발명의 일실시예에 따르면 네트워크의 패킷을 검사하고 처리하는 패킷 검사기를 제공할 수 있으며, 패킷의 페이로드는 정해진 규칙이 없어서 패턴 검사에 시간이 많이 소요되므로 일반적으로 다수의 프로세서를 사용하여 패턴의 일치 여부를 검사할 수 있다.According to an embodiment of the present invention, a packet inspector for inspecting and processing a packet of a network may be provided, and since the payload of a packet does not have a predetermined rule and a pattern check takes a lot of time, a pattern using a plurality of processors is generally used. You can check for matches.

아래에서는 본 발명의 일실시예에 따른 패킷 검사기에 입력된 패킷의 페이로드에 대하여 100개의 패턴이 매칭되는 유무 및 매칭 개수를 검사할 수 하는 경우를 예를 들어서 설명하도록 한다.Hereinafter, a case in which 100 patterns are matched and whether the number of matches is checked with respect to the payload of a packet input to the packet inspector according to an embodiment of the present invention will be described by way of example.

예를 들어, 본 발명의 일실시예에 따른 패킷 검사기는 하나의 프로세서가 패킷의 페이로드에 대하여 100개의 패턴을 검사하는 경우, 시간이 오래 걸리므로 두 개의 프로세서를 사용하여 패킷의 페이로드를 복사하여 각 프로세서가 50 패턴씩 나누어서 검사하면 검사시간을 절반으로 줄일 수 있다.For example, the packet inspector according to an embodiment of the present invention takes a long time when one processor checks 100 patterns of a payload of a packet, so that the payload of the packet is copied using two processors. Thus, if each processor examines 50 patterns in half, the inspection time can be cut in half.

도 10은 본 발명의 일실시예에 따른 패킷 검사기의 패턴 검사 데이터 구조를 도시한 도면이다.10 is a diagram illustrating a pattern check data structure of a packet checker according to an embodiment of the present invention.

예를 들어, 본 발명의 일실시예에 따른 패킷 검사 방법에 따르면, 도 9에 도시된 바와 같이, 변수 i1은 프로세서 1이 패턴 매칭 검사를 수행한 횟수이고 변수 mc1은 패턴 매칭이 된 개수이다. For example, according to the packet inspection method according to an embodiment of the present invention, as shown in FIG. 9, the variable i1 is the number of times that the processor 1 performs the pattern matching check and the variable mc1 is the number of pattern matching.

즉, 본 발명의 일실시예에 따르면 변수 i1이 1이면 첫번째 패턴 매칭 여부의 검사를 완료한 상태이고 50이면 50개의 패턴 검사를 완료한 상태이고, 변수 mc1은 패턴 매칭의 횟수를 누적하여 가산한 데이터로 i1이 50일 때 변수 mc1이 0 이면 매칭되는 패턴이 하나도 없는 것을 의미하며, 변수 mc1이 5이면 5개의 패턴이 패킷의 페이로드와 매칭된 것을 의미한다.That is, according to an embodiment of the present invention, if the variable i1 is 1, the first pattern matching is checked. If the variable i1 is 50, 50 pattern checks are completed. The variable mc1 accumulates and adds the number of pattern matching. If the variable mc1 is 0 when i1 is 50, data means that no pattern is matched. If the variable mc1 is 5, it means that five patterns match the payload of the packet.

마찬가지로 본 발명의 일실시예에 따르면 변수 i2은 프로세서 2이 패턴 매칭 검사를 수행한 횟수이고 변수 mc2은 패턴 매칭이 된 개수이며, 패킷을 검사할 초기에는 변수 i1, i2, mc1, mc2이 모두 0으로 초기화된다.Similarly, according to an embodiment of the present invention, the variable i2 is the number of times that the processor 2 performs the pattern matching check, the variable mc2 is the number of the pattern matching, and initially, the variables i1, i2, mc1, and mc2 are all 0s. Is initialized to

따라서, 본 발명의 일실시예에 따르면 프로세서 1은 변수 i1, i2, mc1, mc2를 읽어서 변수 i1, i2가 각각 50이면 검사를 완료하고, 변수 i1이 50 미만이면 패턴 검사 후 변수 i1, mc1을 업데이트(변수 mc1은 필요한 경우에 업데이트 수행)한다.Therefore, according to an exemplary embodiment of the present invention, the processor 1 reads the variables i1, i2, mc1, and mc2 to complete the check if the variables i1 and i2 are 50, respectively, and checks the variables i1 and mc1 after the pattern check if the variable i1 is less than 50. Update (variable mc1 updates if necessary).

이때, 본 발명의 일실시예는 변수 i1이 50이 될 때까지 패턴 검사를 하고 변수 i1이 50이 되면 변수 i2를 확인하여 변수 i2가 50 미만이면 대기하면서 변수 i2를 확인하고 변수 i2가 50이 되면 수행을 멈춘다.At this time, in one embodiment of the present invention, the pattern is checked until the variable i1 becomes 50, and when the variable i1 becomes 50, the variable i2 is checked. When the variable i2 is less than 50, the variable i2 is checked while the variable i2 is 50. It stops running.

상기와 같은 경우, 본 발명의 일실시예에 따르면 변수 i1, i2가 모두 50인 경우 변수 mc1, mc2를 합하면 패턴 매칭된 누적 개수를 알 수 있다.In this case, according to an embodiment of the present invention, when the variables i1 and i2 are all 50, the sum of the variables mc1 and mc2 may be known to determine the cumulative number of pattern matching.

본 발명의 일실시예에 따른 다중 프로세서 시스템의 구성을 정리하면 다음과 같다.The configuration of a multiprocessor system according to an embodiment of the present invention is as follows.

본 발명의 일실시예에 따른 다중 프로세서 시스템의 프로세서 1…… n(620)은 병렬적으로 수행되는 프로세서이고, 명령어 큐(631)는 다중 프로세서(620)의 각 프로세서로부터 공통 데이터 구조에 대한 읽기 또는 쓰기 명령을 저장하여 명령어 실행기(632)에 제공하고 명령어 실행기(632)에서의 읽기 데이터를 읽기 요청한 프로세서에 전달하는 역할을 수행한다.Processor 1... Of the multiprocessor system according to an embodiment of the present invention. … n 620 is a processor that is performed in parallel, and instruction queue 631 stores instructions for reading or writing common data structures from each processor of multiprocessor 620 to provide to instruction executor 632. The read data at 632 is delivered to the processor that has requested the read.

또한, 본 발명의 일실시예에 따른 다중 프로세서 시스템의 명령어 실행기(632)는 명령어 큐(631)를 순차적으로 검사하여 유효 명령이 있는 경우 읽기 명령이면 메모리(610)의 해당 주소로부터 데이터를 읽어서 명령어 큐에 제공하고, 쓰기 명령이면 메모리(610)의 해당 주소로부터 데이터를 읽어서 임시 버퍼에 저장하고 임시 버퍼 내의 프로세서에 해당하는 비트 데이터를 쓰기 데이터로 교체하고 변경된 임시 버퍼의 데이터를 메모리에 쓰기를 수행한다.In addition, the instruction executor 632 of the multiprocessor system according to an embodiment of the present invention sequentially checks the instruction queue 631 and reads data from a corresponding address of the memory 610 if there is a valid instruction and reads the instruction. If a write command is provided, the data is read from the corresponding address of the memory 610 and stored in the temporary buffer, the bit data corresponding to the processor in the temporary buffer is replaced with the write data, and the data of the changed temporary buffer is written into the memory. do.

이때, 본 발명의 일측에 따른 메모리(610)는 SRAM, SDRAM, DDR SDRAM, MEMORY MODULE 등의 다양한 메모리 수단이 사용될 수 있으며, 명령어 실행기(632)로부터 데이터를 저장하거나 읽기 데이터를 제공하는 역할을 수행한다.In this case, the memory 610 according to an embodiment of the present invention may be used in a variety of memory means such as SRAM, SDRAM, DDR SDRAM, MEMORY MODULE, and the like to store data or provide read data from the command executor 632. do.

본 발명의 일측에 따르면 데이터 구조의 각 변수를 다른 주소로 관리하여도 가능하나 공유데이터이므로 하나의 주소로 관리하는 것이 바람직하다.According to one aspect of the present invention, it is possible to manage each variable of the data structure with a different address, but it is preferable to manage with one address since it is shared data.

즉, 본 발명의 일측에 따르면 각 변수의 크기가 적을 경우는 SRAM, SDRAM을 사용하여 데이터의 일부 비트를 변수로 사용하고 각 변수의 크기가 클 경우는 버스트 모드가 가능한 MEMORY MODULE을 사용하는 것이 바람직하다. That is, according to one side of the present invention, when each variable size is small, it is preferable to use some bits of data as variables using SRAM and SDRAM, and to use burst memory capable of burst mode when each variable size is large. Do.

예를 들어, 각 변수의 크기가 적을 경우는 32비트의 데이터를 4개의 8비트 변수로 사용하고 각 변수의 크기가 클 경우는 64 비트의 데이터를 갖고 버스트 8인 MEMORY MODULE의 경우 64 바이트의 데이터 구조를 하나의 주소로 액세스하여 32비트의 변수를 16개까지 사용하는 것이 가능하다.For example, if the size of each variable is small, 32-bit data is used as four 8-bit variables. If the size of each variable is large, the data is 64 bits and 64 bytes of data for burst memory MEMORY MODULE. It is possible to use up to 16 32-bit variables by accessing the structure as a single address.

본 발명에 따른 실시예들은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(Floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Embodiments according to the present invention can be implemented in the form of program instructions that can be executed by various computer means can be recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of the computer-readable recording medium include magnetic media such as a hard disk, a floppy disk, and a magnetic tape; optical media such as CD-ROM and DVD; magnetic recording media such as a floppy disk; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이상과 같이 본 발명에서는 구체적인 구성 요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다.As described above, the present invention has been described by specific embodiments such as specific components and the like. For those skilled in the art to which the present invention pertains, various modifications and variations are possible. Therefore, the spirit of the present invention should not be limited to the described embodiments, and all of the equivalents or equivalents of the claims as well as the claims to be described later will belong to the scope of the present invention. .

610: 메모리
620: 다중 프로세서
630: 메모리 제어기610: memory
620: multiprocessor
630: memory controller

Claims

A memory for storing shared data;
A processor for managing one or more data blocks of the memory into one data area; And
A memory controller providing specific data among the shared data to the processor according to a control signal received from the processor
Including,
The processor comprising:
When writing data referenced by another processor, writing the data to one or more bit regions exclusive of another processor of the one or more blocks of data,
When the data is read, characterized in that the data of the entire one or more data blocks are read,
The memory controller,
If the control signal is a write command, read the data of the write address from the memory and store the data in the temporary buffer, write data in the corresponding bit of the temporary buffer, and then store the data of the temporary buffer in the memory,
And when the control signal is a read command, read data of a read address from the memory and provide the processor to the processor.