| Modern embedded computer system is becoming more and more complex,which makes software development and system verification become the key steps in system development.In order to speed up the verification and program debug,chip manufacturers are increasingly considering the hardware implementation of program execution path trace.Getting program execution path is the most direct and effective way to master the working state of processor.However,due to the huge amount of complete program execution path information,it is obviously unreasonable to send it directly out of the chip,so it is necessary to design an effective compression mechanism.In addition,on-chip pin resources are extremely valuable,so it is particularly important to use multiplexing pin transmission mechanism for output.RISC-V instruction set is the fifth generation of RISC architecture,which is completely open source.It has been popularized rapidly since it was launched in 2011.As a new processor architecture,RISC-V processor has less research on debugging,especially lack of hardware support for trace debug.In order to solve the above problems,this paper proposes a trace debug hardware structure based on RISC-V architecture processor to capture and compress program execution path information.The structure transforms the program execution path information into a series of hit and miss events through three stages,which greatly reduces the amount of data to be sent from the chip.In the first stage,a block detector is designed to detect block descriptors.The unit makes full use of RISC-V instruction set characteristics,and effectively expands the block length to achieve higher compression ratio.In the second stage,cache technology and trace debug technology are combined to design a block descriptor cache to convert the block descriptor into the block index.In this paper,a multi-dimensional evaluation is carried out to balance the trace performance and resource consumption.In the third stage,the last block predictor is designed to predict the block index.The unit makes full use of the local principle of the program to convert multi bit data into single bit hit signal,which further reduces the amount of data.Thus,at the output of the block descriptor cache and the last block predictor,the sequence of block descriptors from the block detector is converted into a sequence of hit and miss events.These events are encoded and encapsulated as trace packets by trace message encoder,and output by JTAG.In this paper,the designed trace debug module is integrated into RISC-V processor,and RISC-V benchmark is used as test suite to complete the experiment of trace packet collection and program execution path reproduction.The experimental results show that the trace debug module designed in this paper can achieve the average trace packet bandwidth of 0.163 bits/instruction and the average trace port bandwidth below 1bit/instruction,which can meet the single port output requirements.The software successfully parses the trace packets and reproduces the program execution path to meet the functional requirements.On this basis,this paper optimizes the trace debug module,and proposes two optimization schemes of bandwidth and area.The bandwidth optimization scheme further reduces the trace packet bandwidth to 0.140 bits/instruction by adding a start address predictor and an event counter.The area optimization scheme reduces the hardware area by 38.2% compared with the basic scheme by adding a starting address recorder and using hash inference,while the average trace packet bandwidth only increases by 4.3% compared with the bandwidth optimization scheme,which has high cost-effectiveness and practical value. |