1. Problem phenomenon
Debugging environment: single board (AM335X+linux) --------(GE Ethernet)------------ Controller (AM5728+vxworks)
Error scenario: The board acts as an ftp client(172.16.89.4), and the controller acts as an ftp server(172.16.89.71). When the board downloads ftp data, the speed of the board will often suddenly slow down, and sometimes it will be extremely stuck, the worst The case will fail to transmit.
m_Ftp_Connect_Login_Server_Function mmmmm "172.16.89.71" 21 wait for command finished 0 connecting to the Ftp server... "FTP now stateeeeeee:" 1 "hostname lookup in progress" "FTP now stateeeeeee:" 2 "trying to connect to host" "FTP now stateeeeeee:" 3 "A connection to the host has been achieved" wait for command finished 1 m_ftp->currentCommand() 3 "Unknown error" connectToHost the FTP_server is successfully connected logging in... "Unknown error" wait for command finished 2 "FTP now stateeeeeee:" 4 "Connection and user login implemented" m_ftp->currentCommand() 4 "Unknown error" Login the login FTP_server is successful 11111------------------------------------------------------------------------------------------ start!-----------------------------------------------------------!start uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu "/mmc1:2/robot-folder" cd file from server... m_ftp->currentCommand() 7 "Unknown error" cd open file path successfully m_ftp->currentCommand() 6 "Unknown error" close cd file from server... m_ftp->currentCommand() 7 "Unknown error" cd open file path successfully ...
2. Analysis process
Because the board is used as the ftp client and other control boards have not encountered such problems in the process of transmission, we focus on the current controller board side.
Step 1. Check system overload
The first thing to suspect is whether the task load is too heavy when the server side is stuck, causing the scheduling to fail.
First, start the local application of vxworks, and open the ftp download program of the board.
// start the app cd"/mmc1:1" ld<pruss_eth.out prussStart ld<codesyscontrol.out PlcStart"CODESYSControl.cfg"
We checked the cpu usage of vxworks through the spy command, and found that the cpu usage was very low when the stuttering phenomenon was reproduced, and the idle was over 90%:
-> spy value = 548438048 = 0x20b08020 = ' ' -> NAME ENTRY TID PRI total % (ticks) delta % (ticks) ------------ ------------ ---------- --- --------------- --------------- tIsr0 0x3ec07c 0x2027f380 0 0% ( 1) 0% ( 1) tJobTask 0x30eba4 0x204f35c0 0 0% ( 0) 0% ( 0) tExcTask 0x30e544 0x5461a8 0 0% ( 0) 0% ( 0) tLogTask logTask 0x204f3b08 0 0% ( 0) 0% ( 0) tShell0 shellTask 0x20446c80 1 0% ( 0) 0% ( 0) tHrfsCommit: hrfsCommitTa 0x20473b10 2 0% ( 0) 0% ( 0) tSpyTask spyComTask 0x20b08020 5 0% ( 9) 0% ( 3) tErfTask 0x255664 0x2022aae8 10 0% ( 0) 0% ( 0) tVxdbgTask 0x298384 0x20431868 25 0% ( 0) 0% ( 0) CAAEventTask 0x108da24 0x20a7c430 30 0% ( 0) 0% ( 0) SchedExcepti 0x108da24 0x20a7c020 31 0% ( 0) 0% ( 0) tDmaJobTask jobQueueProc 0x20000640 45 0% ( 0) 0% ( 0) tTcfEvents 0x3bba88 0x551668 49 0% ( 0) 0% ( 0) tTcf 0x2bfa60 0x20449238 49 0% ( 1) 0% ( 0) tTcf 0x2bfa60 0x20455858 49 0% ( 0) 0% ( 0) tTcf 0x2bfa60 0x20458a90 49 0% ( 0) 0% ( 0) tNet0 ipcomNetTask 0x20232630 50 0% ( 0) 0% ( 0) tNetConf 0x1b0974 0x204165c0 50 0% ( 0) 0% ( 0) ipftps ipftps_main 0x20428348 50 0% ( 0) 0% ( 0) tAnalysisAge cafe_event_t 0x53dea8 50 0% ( 0) 0% ( 0) SchedProcess 0x108da24 0x20a68af0 64 0% ( 5) 0% ( 2) xHCD_IH0 usbXhcdInter 0x204db0d0 90 0% ( 0) 0% ( 0) xHCD_IH1 usbXhcdInter 0x205d5348 90 0% ( 0) 0% ( 0) BlkDrvUdp 0x108da24 0x20a90020 95 0% ( 1) 0% ( 0) BlkDrvTcp 0x108da24 0x20a90298 95 0% ( 5) 0% ( 5) tiCardMon 0x109924 0x202870d0 100 0% ( 0) 0% ( 0) tiCardMon 0x109924 0x202879d8 100 0% ( 0) 0% ( 0) BusM A usbHubThread 0x204c54a0 100 0% ( 2) 0% ( 0) BusM B usbHubThread 0x204c5c08 100 0% ( 0) 0% ( 0) BusM C usbHubThread 0x204eb7a0 100 0% ( 2) 0% ( 0) BusM D usbHubThread 0x205d5050 100 0% ( 0) 0% ( 0) tTffsPTask flPollTask 0x206929b0 100 0% ( 0) 0% ( 0) /mmc1XbdSvc 0x30c984 0x2045e010 100 0% ( 0) 0% ( 0) SupervisorCy 0x108da24 0x20aa4020 128 0% ( 0) 0% ( 0) OPCUAServer 0x108da24 0x20aa4ad8 128 0% ( 2) 0% ( 0) tUsb2Clr 0x3d1aa8 0x20692d10 150 0% ( 0) 0% ( 0) WebServerClo 0x108da24 0x20a7c788 150 0% ( 1) 0% ( 0) CommCtrl CommControlL 0x20491850 161 0% ( 0) 0% ( 0) vnSyncer vn_sync_task 0x20464988 200 0% ( 0) 0% ( 0) miiBusMonito 0x24f3b8 0x204f32a0 252 0% ( 3) 0% ( 0) MainLoop 0x108da24 0x20a90510 254 0% ( 0) 0% ( 0) KERNEL 0% ( 4) 0% ( 4) INTERRUPT 0% ( 0) 0% ( 0) IDLE 98% ( 2045) 97% ( 512) TOTAL 98% ( 2081) 97% ( 527)
Summary: It has nothing to do with system load.
Step 2. Check the file system read and write speed
Could it be that the read and write speed of the file system is slowing down the ftp speed.
There are two partitions under vxworks, "/mmc1:1" is the dosfs file format, "/mmc1:2" is the hrfs file format:
xbdCreatePartition ("/mmc1:1", 2, 50, 0, 0) dosFsVolFormat ("/mmc1:1", 0, 0) hrfsFormat ("/mmc1:2", 0ll, 0, 0)
We tested the file read and write speed of the two partitions and found that the read speed was okay, but the write speed was very slow.
[vxWorks *]# ls -l ... -rwxrwxrwx 1 0 0 21774591 Jan 1 1980 codesyscontrol.out -> timex(cp,"/mmc1:1/codesyscontrol.out","/mmc1:1/codesyscontrol.out.bak") copying file /mmc1:1/codesyscontrol.out -> /mmc1:1/codesyscontrol.out.bak timex: time of execution = 17873 +/- 0 (0%) millisecs value = 54 = 0x36 = '6' -> -> -> timex(cp,"/mmc1:2/codesyscontrol.out","/mmc1:2/codesyscontrol.out.bak") copying file /mmc1:2/codesyscontrol.out -> /mmc1:2/codesyscontrol.out.bak timex: time of execution = 137 +/- 0 (0%) secs value = 47 = 0x2f = '/' ->
ftp uses the "/mmc1:2" partition. When there is a failure, it is only performing ftp download. For the file system, it is a read operation, but is the read operation blocked by the write operation elsewhere?
Summary: This is also a point of suspicion, but other subsequent tests ruled it out.
Step 3. Check network port 0 and network port 1
There are two GE network ports on the ftp server side, and the network port 0 is always used for ftp download.
When the stuck failure recurs, the ftp communication between the board and AM5728 through network port 0(cpsw0) is very stuck. At this time, the ftp communication between the board and AM5728 through network port 1(cpsw1) and AM5728 is very smooth. , change the pc to network port 0(cpsw0)ftp transmission, the same freeze.
This has eliminated many factors at once, indicating that the fault is concentrated on the network port 0 transmission chain of AM5728: network card drivers, network protocol stacks, and hardware wiring are all suspected.
Summary: Narrowing down the positioning range, the fault is concentrated on the network port 0 transmission chain of the AM5728.
Step 4. Check the network card driver and network protocol stack
Change the board to ftp communication with AM5728 through network port 1(cpsw1).
Using the same ftp test, there was no problem in the continuous test for one day.
Because the network card driver and network protocol stack of network port 0 and network port 1 are basically the same, the hardware wiring of network port 0 is more suspicious.
Summary: If the scope is further narrowed, the trace signal is suspected, but there is still a lack of direct evidence.
Step 5. Check the wiring signal of network port 0
We plan to read the mac layer statistics of network port 0 to see its error statistics. If there are various errors caused by the quality of the trace signal, the error statistics should be seen here.
Because the native driver code does not support, we add the corresponding function:
diff --git a/vxbFdtTiCpswEnd.c b/vxbFdtTiCpswEnd.c index e60c507..4c32f0e 100644 (file) --- a/vxbFdtTiCpswEnd.c +++ b/vxbFdtTiCpswEnd.c @@ -256,6 +257,7 @@ LOCAL char * rxChanErrMsg [] = /* common resource which is shared by all port */ LOCAL CPSW_SW_CTRL * pSwCtrl = NULL; +CPSW_STAT tmp_cpswStat; /* import functions */ @@ -2872,6 +2874,70 @@ LOCAL STATUS cpswEndStatsDump return OK; } +STATUS cpswEndStatsShow + ( + ) + { + int i; + VXB_DEV_ID pDev = pSwCtrl->pDev; + UINT32 * temp = (UINT32 *)&tmp_cpswStat; + + /* + * hardware statistic counters are write-to-decrement, + * after a read, we write the value read to clear + * the counters + */ + + for (i = 0; i < sizeof (CPSW_STAT) / sizeof (UINT32); i++) + { + *temp = CSR_READ_4 (pDev, pSwCtrl->statsOffset + i*4); + //CSR_WRITE_4 (pDev, pSwCtrl->statsOffset + i*4, *temp); + temp++; + } + + (* _func_kprintf)("cpswMacStat rxgood = %d\n", tmp_cpswStat.rxgood); + (* _func_kprintf)("cpswMacStat rxbroadcast = %d\n", tmp_cpswStat.rxbroadcast); + (* _func_kprintf)("cpswMacStat rxmulticast = %d\n", tmp_cpswStat.rxmulticast); + (* _func_kprintf)("cpswMacStat rxpause = %d\n", tmp_cpswStat.rxpause); + (* _func_kprintf)("cpswMacStat rxcrcerros = %d\n", tmp_cpswStat.rxcrcerros); + (* _func_kprintf)("cpswMacStat rxalignmenterrors = %d\n", tmp_cpswStat.rxalignmenterrors); + (* _func_kprintf)("cpswMacStat rxoversized = %d\n", tmp_cpswStat.rxoversized); + (* _func_kprintf)("cpswMacStat rxjabber = %d\n", tmp_cpswStat.rxjabber); + (* _func_kprintf)("cpswMacStat rxundersized = %d\n", tmp_cpswStat.rxundersized); + (* _func_kprintf)("cpswMacStat rxfrags = %d\n", tmp_cpswStat.rxfrags); + (* _func_kprintf)("cpswMacStat rxoctets = %d\n\n", tmp_cpswStat.rxoctets); + + (* _func_kprintf)("cpswMacStat txgood = %d\n", tmp_cpswStat.txgood); + (* _func_kprintf)("cpswMacStat txbroadcast = %d\n", tmp_cpswStat.txbroadcast); + (* _func_kprintf)("cpswMacStat txmulticast = %d\n", tmp_cpswStat.txmulticast); + (* _func_kprintf)("cpswMacStat txpause = %d\n", tmp_cpswStat.txpause); + (* _func_kprintf)("cpswMacStat txdefered = %d\n", tmp_cpswStat.txdefered); + (* _func_kprintf)("cpswMacStat txcollision = %d\n", tmp_cpswStat.txcollision); + (* _func_kprintf)("cpswMacStat txsinglecol = %d\n", tmp_cpswStat.txsinglecol); + (* _func_kprintf)("cpswMacStat txmulticol = %d\n", tmp_cpswStat.txmulticol); + (* _func_kprintf)("cpswMacStat txexceesive = %d\n", tmp_cpswStat.txexceesive); + (* _func_kprintf)("cpswMacStat txlatecol = %d\n", tmp_cpswStat.txlatecol); + (* _func_kprintf)("cpswMacStat txunderrun = %d\n", tmp_cpswStat.txunderrun); + (* _func_kprintf)("cpswMacStat txcariersense = %d\n", tmp_cpswStat.txcariersense); + (* _func_kprintf)("cpswMacStat txoctets = %d\n\n", tmp_cpswStat.txoctets); + + (* _func_kprintf)("cpswMacStat sz64octets = %d\n", tmp_cpswStat.sz64octets); + (* _func_kprintf)("cpswMacStat sz65_127octets = %d\n", tmp_cpswStat.sz65_127octets); + (* _func_kprintf)("cpswMacStat sz128_255octets = %d\n", tmp_cpswStat.sz128_255octets); + (* _func_kprintf)("cpswMacStat sz256_511octets = %d\n", tmp_cpswStat.sz256_511octets); + (* _func_kprintf)("cpswMacStat sz512_1023octets = %d\n", tmp_cpswStat.sz512_1023octets); + (* _func_kprintf)("cpswMacStat sz1024octets = %d\n", tmp_cpswStat.sz1024octets); + (* _func_kprintf)("cpswMacStat netoctets = %d\n", tmp_cpswStat.netoctets); + (* _func_kprintf)("cpswMacStat rxfifooverrun = %d\n", tmp_cpswStat.rxfifooverrun[0]); + (* _func_kprintf)("cpswMacStat rxfifooverrun = %d\n", tmp_cpswStat.rxfifooverrun[1]); + (* _func_kprintf)("cpswMacStat rxfifooverrun = %d\n\n", tmp_cpswStat.rxfifooverrun[2]); + + (* _func_kprintf)("cpswMacStat port 0 cpswInDropped = %lld\n", pSwCtrl->port[0]->cpswInDropped); + (* _func_kprintf)("cpswMacStat port 1 cpswInDropped = %lld\n\n\n", pSwCtrl->port[1]->cpswInDropped); + + return OK; + } + /******************************************************************************* * * cpswEndIoctl - the driver I/O control routine
In normal transmission, occasionally there is a case of receiving crc error:
-> cpswEndStatsShow ... cpswEndStatsShow,2900, rxgood = 1023 cpswEndStatsShow,2901, rxbroadcast = 0 cpswEndStatsShow,2902, rxmulticast = 0 cpswEndStatsShow,2903, rxpause = 0 cpswEndStatsShow,2904, rxcrcerros = 17 cpswEndStatsShow,2905, rxalignmenterrors = 0 cpswEndStatsShow,2906, rxoversized = 0 cpswEndStatsShow,2907, rxjabber = 0 cpswEndStatsShow,2908, rxundersized = 0 cpswEndStatsShow,2909, rxfrags = 0 cpswEndStatsShow,2910, rxoctets = 83903
In the case of extreme lag, the received crc error increases greatly:
-> cpswEndStatsShow ... cpswEndStatsShow,2900, rxgood = 5544 cpswEndStatsShow,2901, rxbroadcast = 181 cpswEndStatsShow,2902, rxmulticast = 158 cpswEndStatsShow,2903, rxpause = 0 cpswEndStatsShow,2904, rxcrcerros = 109 cpswEndStatsShow,2905, rxalignmenterrors = 0 cpswEndStatsShow,2906, rxoversized = 0 cpswEndStatsShow,2907, rxjabber = 0 cpswEndStatsShow,2908, rxundersized = 0 cpswEndStatsShow,2909, rxfrags = 0 cpswEndStatsShow,2910, rxoctets = 433598
Construct a packet with full FF content, send 100, and send 100 errors:
-> cpswEndStatsShow ... cpswEndStatsShow,2900, rxgood = 9844 cpswEndStatsShow,2901, rxbroadcast = 613 cpswEndStatsShow,2902, rxmulticast = 379 cpswEndStatsShow,2903, rxpause = 0 cpswEndStatsShow,2904, rxcrcerros = 272 cpswEndStatsShow,2905, rxalignmenterrors = 0 cpswEndStatsShow,2906, rxoversized = 0 cpswEndStatsShow,2907, rxjabber = 0 cpswEndStatsShow,2908, rxundersized = 0 cpswEndStatsShow,2909, rxfrags = 0 cpswEndStatsShow,2910, rxoctets = 764175 -> cpswEndStatsShow ... cpswEndStatsShow,2900, rxgood = 9877 cpswEndStatsShow,2901, rxbroadcast = 635 cpswEndStatsShow,2902, rxmulticast = 387 cpswEndStatsShow,2903, rxpause = 0 cpswEndStatsShow,2904, rxcrcerros = 374 cpswEndStatsShow,2905, rxalignmenterrors = 0 cpswEndStatsShow,2906, rxoversized = 0 cpswEndStatsShow,2907, rxjabber = 0 cpswEndStatsShow,2908, rxundersized = 0 cpswEndStatsShow,2909, rxfrags = 0 cpswEndStatsShow,2910, rxoctets = 766815
The same test, network port 1 tested for a day without crc error.
Summary: It is indeed a problem with the signal quality of the hardware wiring, which causes data errors during transmission.
Step 6. Network port 0 slowdown test attempt
We try to reduce the speed of network port 0 to 100M to test whether the speed reduction can eliminate the error.
Because the native driver does not have a similar function, we also have to write one manually:
diff --git a/vxbFdtTiCpswEnd.c b/vxbFdtTiCpswEnd.c index 4c32f0e..71afa1a 100644 (file) --- a/vxbFdtTiCpswEnd.c +++ b/vxbFdtTiCpswEnd.c @@ -2938,6 +2946,126 @@ STATUS cpswEndStatsShow return OK; } +STATUS cpswEndLinkSet +( + UINT32 port, + UINT32 mode +) +{ + CPSW_DRV_CTRL * pDrvCtrl; + + if ((port > 1) || (mode > 4)) + return -1; + + pDrvCtrl = pSwCtrl->port[port]; + + switch(mode) + { + case 0: + (void)miiBusModeSet (pDrvCtrl->cpswMiiDev, IFM_ETHER | IFM_AUTO); + (* _func_kprintf)("set port %d mode: AUTO_SEL\n"); + break; + case 1: + (void)miiBusModeSet (pDrvCtrl->cpswMiiDev, IFM_ETHER|IFM_100_TX|IFM_FDX); + (* _func_kprintf)("set port %d mode: 100 FDX\n"); + break; + case 2: + (void)miiBusModeSet (pDrvCtrl->cpswMiiDev, IFM_ETHER|IFM_100_TX); + (* _func_kprintf)("set port %d mode: 100 HDX\n"); + break; + case 3: + (void)miiBusModeSet (pDrvCtrl->cpswMiiDev, IFM_ETHER|IFM_10_T|IFM_FDX); + (* _func_kprintf)("set port %d mode: 10 FDX\n"); + break; + case 4: + (void)miiBusModeSet (pDrvCtrl->cpswMiiDev, IFM_ETHER|IFM_10_T); + (* _func_kprintf)("set port %d mode: 10 HDX\n"); + break; + default: + break; + } + + return OK; +}
Forcibly set network port 0 to 100M full-duplex mode:
-> cpswEndLinkSet(0, 1)
The network port 0 is slowed down to 100M mode, and there is no crc error after testing for a long time.
Summary: Slowing down can solve the problem of crc error and is a workaround.
3. Conclusion
To sum up, the root cause of ftp slowdown is a large number of wrong packets and packet loss caused by unstable hardware signals.
The current workaround solution can reduce the speed of network port 0 to 100M for use.
The fundamental solution requires hardware re-wiring and re-simulation. After the single board is completed, a complete signal test needs to be done, so that the high-speed signal can have a solid and stable foundation.