📄 ts201_link.asm
字号:
/***********************************************************************************************
TigerSHARC link boot loader
TS201_link.asm
This code is written unoptimized one instruction per line for clarity.
Revision history:
11/1/2002 Boris Lerner Version 1.0.0
5/7/2003 Boris Lerner Version 1.0.1 Corrected ISR, length field of cache
9/15/2003 Boris Lerner Version 1.0.2 Finalized comments, tested on silicon
12/4/2003 Boris Lerner Version 1.0.3 Corrected writes to CCAIR - via IALU now
Algorithm description:
1. This boot loader gets loaded by link port DMA of the TigerSharc initially into
memory 0x00000000 - 0x000000ff. After the boot loader is loaded, the link port DMA
interrupt wakes the TigerSharc up and starts the execution of the loader at location
0x00000000. At this stage, the TigerSharc is at the interrupt level of the link DMA
and, thus, further link port DMAs and global (SQSTAT[20]) interrupts are disabled.
2. The constant LINK defines which link port is used for booting. It is set to 0
(i.e. link port 0) in this code. If a different link port is used, the constant value
has to be changed and the code re-built.
3. Note the code delimited by the labels __init_debug_start and __init_debug_end. This
code's intention is to describe the final state of some of the system registers of the
processor, after the loader has finished booting all of the user code. The simulator and
the emulator use this code to set these registers to the same state when a DXE file is
loaded (thus bypassing running of the loader). The function that is important to the
following operation of the loader is RDS, it reduces the interrupt level of AUTODMA0,
thus allowing further AUTODMA0 interrupts. Before RDS;; it is very important to set NMOD
bit in the SQCTL register to ensure that the processor remains in the supervisor mode.
Also, all link port DMAs are disabled. The real loader code starts at label __init_debug_end.
4. All cache is disabled. This is not necessary for the correct operation of the loader
but is done to bring cache to a known clean state when the user code takes over.
5. The loader sets the NMOD, TRCBEN and GIE bits in SQCTL register to insure supervisor
mode and enable trace buffer and global interrupts.
6. Link port receive DMA interrupt vector is set to _dma_int. Unused link ports are
cleared and disabled, link port DMAs are disabled.
7. Link ports control registers are initialized.
8. The DMA that will bring the data in from the link port will do this one quad word
at a time. XR3:0 are preset with the required values for the TCB.
9. Data from the link is read by the routine _read_word. Since data from link is
always in quad format, but the processor needs to parse it in single 32-bit words at a
time, an internal FIFO buffer is maintained. This is implemented as a circular buffer
in memory locations 0x00-0x03, J2 is dedicated as the read pointer to the buffer and, thus,
J2, JB2 and JL2 are all initialized accordingly. The execution flow of _read_word is:
a. First J2 is checked to see if it has wrapped back to 0 (i.e. all the data in
the buffer has been read) and, if it has not, go to step "d" to read the next
piece of data from the buffer.
b. Another quad word is brought into the buffer from the link port via link DMA.
Link port DMA is started by writing XR3:0 to the TCB and the routine waits for the
DMA interrupt in IDLE.
c. When the new quad word arrives from the link port, DMA interrupt wakes the
processor up from IDLE and execution is branched to _dma_int, where line of NOPs
followed by RTI;; returns it back to one instruction past the IDLE. Note that the line
of NOPs is necessary here, RTI;; is not allowed to be in the first quad of an ISR.
d. Data from the buffer pointed to by J2 is read into xR4 and J2 is incremented
circularly.
10. Unlike other boot modes, here processor ID is not used, this loader does not
support the MP boot.
11. Now, the loader parses the blocks of data from the link port. Two words are moved to
yR8 and J0. These are the tag words of the block to follow. In the first word, bits 31:30
are block TYPE (0=final init, 1=non-zero init, 2=zero init), bits 29:16 are reserved and
bits 15:0 are the block COUNT. The second tag word is the pointer to DESTINATION.
12. If type is 1, COUNT number of words are moved one word at a time via _read_word to
the DESTINATION. Once finished, algorithm goes to step 11.
13. If type is 2, the COUNT number of zeros are moved to the DESTINATION. Once finished,
algorithm goes to step 11.
14. If type is 0, the loader performs the final init, i.e. it overwrites itself with
the user code. The following algorithm is used:
a. First 28 instrucions of user code (destined to locations 0x00000000-0x0000001B) are
moved from the link port via _read_word and stored in the registers xR31:8 and yR31:28.
b. The interrupt service routine at _dma_int is relocated to 0x04-0x08.
c. 19 instructions of _last_patch_code are relocated to locations 0x09-0x1B.
d. Branch Target Buffer is invalidated (BTBINV) to clear cached branches.
e. Cache is re-enabled.
f. The link port DMA interrupt vector is set to 0x04 (the location now containing the
interrupt service routine as a result of step b).
g. yR0 is preset to SQCTL_NMOD | SQCTL_TRCBEN (to leave global interrupts disabled
when the user code starts, but to ensure that NMOD bit (supervisor mode) and TRCBEN
(trace buffer) are enabled. This value will be written into SQCTL in the
_last_patch_code that was relocated to 0x00000000-0x0000001B.
h. J0 is intialized to 0x1C (first location past relocated _last_patch_code) and LC0 is
initialized to 0xE4 (number of words left in the final init to be read).
i. At this stage, locations 0x04-0x1B are initialized as follows:
0x04: _relocated_dma_int:
nop; nop; nop; nop;;
rti(NP);;
0x09: _relocated_read_word:
comp(j2,0);; // if J2 -> start of the buffer...
if njeq, jump _relocated_read_buffer (NP);; // ...bring in more data
DCx = xr3:0;; // start the DMA
idle;; // wait till DMA interrupts
_relocated_read_buffer:
xr4 = cb[j2+=1];; // read the word from the buffer
cjmp (ABS) (NP);; // and return
0x0F: _relocated_final_init1:
call _read_word (NP);; // read word
[j0 += 1] = xr4;; // write it
if NLC0E, jump _relocated_final_init1 (NP);;
SQCTL = yr0;; // disable interrupts
nop;;
Q[j31 + 0] = xr11:8;; // overwrite 0x00-0x03
Q[j31 + 4] = xr15:12;; // overwrite 0x04-0x07
Q[j31 + 8] = xr19:16;; // overwrite 0x08-0x0b
Q[j31 + 0xc] = xr23:20;; // overwrite 0x0c-0x0f
Q[j31 + 0x10] = xr27:24;; // overwrite 0x10-0x13
Q[j31 + 0x14] = xr31:28;; // overwrite 0x14-0x17
jump 0 (ABS) (NP); Q[j31 + 0x18] = yr31:28;; // overwrite 0x18-0x1b, start at 0
j. The code execution jumps to 0x0F, i.e. _relocated_final_init1 shown above.
k. Locations 0x1C-0xFF are filled with data from the link port. Note that call _read_word (NP);;
at _relocated_final_init1 is a relative call. Thus it actually calls _relocated_read_word and
overwriting old _read_word does not cause any problems.
l. Now link receiving is finished, correct data is in 0x1C-0xFF and the data that should be in
0x00-0x1B is in registers xR31:8 and yR31:28. The remaining code overwrites memory location
0x00-0x17 with the data in xR31:8 and, finally, the last line of code overwrites locations
0x18-0x1B (including itself) with data from yR31:28 while executing an absolute jump to 0x00.
m. The user code starts at 0x00 cleanly.
***********************************************************************************************/
#define LINK 1
//**********************************************************************************************
#include <defts201.h>
//**********************************************************************************************
.section seg_ldr;
//************************************** Start of code *****************************************
// The following code between the labels __init_debug_start and __init_debug_end will be
// executed by the simulator and the emulator after a software reset to match the processor's
// state past the boot loader.
//******************************* Start of Initialization code *********************************
__init_debug_start:
SQCTL = SQCTL_NMOD | SQCTL_TRCBEN;; // DBGEN, NMOD, TRCBEN set, global ints disabled
BTBEN;; // enable the BTB
// ~~~ 03-00-0359 WORK-AROUND ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ //
#if 1
MR1:0 += R1:0*R3:2;;
R4 = R0 + R1;
LBUFTX0 = XR3:0;;
LBUFTX0 = YR3:0;;
#endif
xr1 = 0x00040004;; // count = 4, modify = 4
xr3 = TCB_DISABLE;; // disable DMA, int mem,prio=norm,2D=no,word=quad,int=yes,RQ=enbl,chain=no
DC4 = xr3:0;; // disable DMA4
DC5 = xr3:0;; // disable DMA5
DC6 = xr3:0;; // disable DMA6
DC7 = xr3:0;; // disable DMA7
DC8 = xr3:0;; // disable DMA8
DC9 = xr3:0;; // disable DMA9
DC10 = xr3:0;; // disable DMA10
DC11 = xr3:0;; // disable DMA11
xr3 = TCB_NORMAL | TCB_INT;; // disable DMA0
DCS0 = xr3:0;;
DCD0 = xr3:0;;
j0 = j31 + 0x0;; // set index to 0 - broadcast
CCAIRALL = j0;;
CACMDALL = CACMD_INV | // invalidate...
(127 << CACMD_LEN_P) | // ...the entire cache...
CACMD_NOSTALL;; // ...abort (not stall) commands until done
call _wait_for_cache (NP);;
// ~~~ 03-00-0358 WORK-AROUND ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ //
#if 1
YR1:0 = MR1:0;;
#endif
CACMDALL = CACMD_EN;; // enable cache - broadcast
rds;; // reduce interrupt to subroutine level
__init_debug_end:
//******************************** End of Initialization code *********************************
// INSERT CUSTOM INITS HERE - DRAM REGISTERS ETC...
CACMDALL = CACMD_DIS;; // disable cache - broadcast
call _wait_for_cache (NP);;
SQCTL = SQCTL_NMOD | SQCTL_TRCBEN | SQCTL_GIE;; // DBGEN, NMOD, TRCBEN set, global ints enabled
j0 = j31 + _dma_int;;
yr0 = 0;; // will be used for zero init and ints disable in patch
xr1 = LRCTL_REN;;
#if LINK==0
IVDMA8 = j0;;
xr0 = LRSTAT1;;
xr0 = LRSTATC1;;
xr0 = LRSTAT2;;
xr0 = LRSTATC2;;
xr0 = LRSTAT3;;
xr0 = LRSTATC3;;
LRCTL0 = xr1;;
#endif
#if LINK==1
IVDMA9 = j0;;
xr0 = LRSTAT0;;
xr0 = LRSTATC0;;
xr0 = LRSTAT2;;
xr0 = LRSTATC2;;
xr0 = LRSTAT3;;
xr0 = LRSTATC3;;
LRCTL1 = xr1;;
#endif
#if LINK==2
IVDMA10 = j0;;
xr0 = LRSTAT0;;
xr0 = LRSTATC0;;
xr0 = LRSTAT1;;
xr0 = LRSTATC1;;
xr0 = LRSTAT3;;
xr0 = LRSTATC3;;
LRCTL2 = xr1;;
#endif
#if LINK==3
IVDMA11 = j0;;
xr0 = LRSTAT0;;
xr0 = LRSTATC0;;
xr0 = LRSTAT1;;
xr0 = LRSTATC1;;
xr0 = LRSTAT2;;
xr0 = LRSTATC2;;
LRCTL3 = xr1;;
#endif
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -