vxworks issue: FTP tansfer slowly

1. Problem phenomenon

Debugging environment: single board (AM335X+linux) --------(GE Ethernet)------------ Controller (AM5728+vxworks)

Error scenario: The board acts as an ftp client(172.16.89.4), and the controller acts as an ftp server(172.16.89.71). When the board downloads ftp data, the speed of the board will often suddenly slow down, and sometimes it will be extremely stuck, the worst The case will fail to transmit.

m_Ftp_Connect_Login_Server_Function 
mmmmm "172.16.89.71" 21 
wait for command finished 0 
connecting to the Ftp server... 
"FTP now stateeeeeee:" 1 "hostname lookup in progress" 
"FTP now stateeeeeee:" 2 "trying to connect to host" 
"FTP now stateeeeeee:" 3 "A connection to the host has been achieved" 
wait for command finished 1 
m_ftp->currentCommand() 3 "Unknown error" 
connectToHost 
the FTP_server is successfully connected 
logging in... 
"Unknown error" 
wait for command finished 2 
"FTP now stateeeeeee:" 4 "Connection and user login implemented" 
m_ftp->currentCommand() 4 "Unknown error" 
Login 
the login FTP_server is successful 
11111------------------------------------------------------------------------------------------ 
start!-----------------------------------------------------------!start 
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu "/mmc1:2/robot-folder" 
cd file from server... 
m_ftp->currentCommand() 7 "Unknown error" 
cd 
open file path successfully 
m_ftp->currentCommand() 6 "Unknown error" 
close 
cd file from server... 
m_ftp->currentCommand() 7 "Unknown error" 
cd 
open file path successfully 

...

2. Analysis process

Because the board is used as the ftp client and other control boards have not encountered such problems in the process of transmission, we focus on the current controller board side.

Step 1. Check system overload

The first thing to suspect is whether the task load is too heavy when the server side is stuck, causing the scheduling to fail.

First, start the local application of vxworks, and open the ftp download program of the board.

// start the app
cd"/mmc1:1"
ld<pruss_eth.out
prussStart
ld<codesyscontrol.out
PlcStart"CODESYSControl.cfg"

We checked the cpu usage of vxworks through the spy command, and found that the cpu usage was very low when the stuttering phenomenon was reproduced, and the idle was over 90%:

-> spy
value = 548438048 = 0x20b08020 = ' '
-> 
    NAME         ENTRY       TID      PRI   total % (ticks)  delta % (ticks)
------------ ------------ ----------  ---   ---------------  ---------------
tIsr0        0x3ec07c     0x2027f380    0     0% (       1)    0% (       1)
tJobTask     0x30eba4     0x204f35c0    0     0% (       0)    0% (       0)
tExcTask     0x30e544     0x5461a8      0     0% (       0)    0% (       0)
tLogTask     logTask      0x204f3b08    0     0% (       0)    0% (       0)
tShell0      shellTask    0x20446c80    1     0% (       0)    0% (       0)
tHrfsCommit: hrfsCommitTa 0x20473b10    2     0% (       0)    0% (       0)
tSpyTask     spyComTask   0x20b08020    5     0% (       9)    0% (       3)
tErfTask     0x255664     0x2022aae8   10     0% (       0)    0% (       0)
tVxdbgTask   0x298384     0x20431868   25     0% (       0)    0% (       0)
CAAEventTask 0x108da24    0x20a7c430   30     0% (       0)    0% (       0)
SchedExcepti 0x108da24    0x20a7c020   31     0% (       0)    0% (       0)
tDmaJobTask  jobQueueProc 0x20000640   45     0% (       0)    0% (       0)
tTcfEvents   0x3bba88     0x551668     49     0% (       0)    0% (       0)
tTcf         0x2bfa60     0x20449238   49     0% (       1)    0% (       0)
tTcf         0x2bfa60     0x20455858   49     0% (       0)    0% (       0)
tTcf         0x2bfa60     0x20458a90   49     0% (       0)    0% (       0)
tNet0        ipcomNetTask 0x20232630   50     0% (       0)    0% (       0)
tNetConf     0x1b0974     0x204165c0   50     0% (       0)    0% (       0)
ipftps       ipftps_main  0x20428348   50     0% (       0)    0% (       0)
tAnalysisAge cafe_event_t 0x53dea8     50     0% (       0)    0% (       0)
SchedProcess 0x108da24    0x20a68af0   64     0% (       5)    0% (       2)
xHCD_IH0     usbXhcdInter 0x204db0d0   90     0% (       0)    0% (       0)
xHCD_IH1     usbXhcdInter 0x205d5348   90     0% (       0)    0% (       0)
BlkDrvUdp    0x108da24    0x20a90020   95     0% (       1)    0% (       0)
BlkDrvTcp    0x108da24    0x20a90298   95     0% (       5)    0% (       5)
tiCardMon    0x109924     0x202870d0  100     0% (       0)    0% (       0)
tiCardMon    0x109924     0x202879d8  100     0% (       0)    0% (       0)
BusM A       usbHubThread 0x204c54a0  100     0% (       2)    0% (       0)
BusM B       usbHubThread 0x204c5c08  100     0% (       0)    0% (       0)
BusM C       usbHubThread 0x204eb7a0  100     0% (       2)    0% (       0)
BusM D       usbHubThread 0x205d5050  100     0% (       0)    0% (       0)
tTffsPTask   flPollTask   0x206929b0  100     0% (       0)    0% (       0)
/mmc1XbdSvc  0x30c984     0x2045e010  100     0% (       0)    0% (       0)
SupervisorCy 0x108da24    0x20aa4020  128     0% (       0)    0% (       0)
OPCUAServer  0x108da24    0x20aa4ad8  128     0% (       2)    0% (       0)
tUsb2Clr     0x3d1aa8     0x20692d10  150     0% (       0)    0% (       0)
WebServerClo 0x108da24    0x20a7c788  150     0% (       1)    0% (       0)
CommCtrl     CommControlL 0x20491850  161     0% (       0)    0% (       0)
vnSyncer     vn_sync_task 0x20464988  200     0% (       0)    0% (       0)
miiBusMonito 0x24f3b8     0x204f32a0  252     0% (       3)    0% (       0)
MainLoop     0x108da24    0x20a90510  254     0% (       0)    0% (       0)
KERNEL                                        0% (       4)    0% (       4)
INTERRUPT                                     0% (       0)    0% (       0)
IDLE                                         98% (    2045)   97% (     512)
TOTAL                                        98% (    2081)   97% (     527)

Summary: It has nothing to do with system load.

Step 2. Check the file system read and write speed

Could it be that the read and write speed of the file system is slowing down the ftp speed.

There are two partitions under vxworks, "/mmc1:1" is the dosfs file format, "/mmc1:2" is the hrfs file format:

xbdCreatePartition ("/mmc1:1", 2, 50, 0, 0)
dosFsVolFormat ("/mmc1:1", 0, 0)
hrfsFormat ("/mmc1:2", 0ll, 0, 0)

We tested the file read and write speed of the two partitions and found that the read speed was okay, but the write speed was very slow.

[vxWorks *]# ls -l
... 
-rwxrwxrwx  1 0       0             21774591 Jan  1  1980 codesyscontrol.out 

-> timex(cp,"/mmc1:1/codesyscontrol.out","/mmc1:1/codesyscontrol.out.bak")
copying file /mmc1:1/codesyscontrol.out -> /mmc1:1/codesyscontrol.out.bak
timex: time of execution = 17873 +/- 0 (0%) millisecs
value = 54 = 0x36 = '6'
-> 
-> 
-> timex(cp,"/mmc1:2/codesyscontrol.out","/mmc1:2/codesyscontrol.out.bak")
copying file /mmc1:2/codesyscontrol.out -> /mmc1:2/codesyscontrol.out.bak
timex: time of execution = 137 +/- 0 (0%) secs
value = 47 = 0x2f = '/'
-> 

ftp uses the "/mmc1:2" partition. When there is a failure, it is only performing ftp download. For the file system, it is a read operation, but is the read operation blocked by the write operation elsewhere?

Summary: This is also a point of suspicion, but other subsequent tests ruled it out.

Step 3. Check network port 0 and network port 1

There are two GE network ports on the ftp server side, and the network port 0 is always used for ftp download.

When the stuck failure recurs, the ftp communication between the board and AM5728 through network port 0(cpsw0) is very stuck. At this time, the ftp communication between the board and AM5728 through network port 1(cpsw1) and AM5728 is very smooth. , change the pc to network port 0(cpsw0)ftp transmission, the same freeze.

This has eliminated many factors at once, indicating that the fault is concentrated on the network port 0 transmission chain of AM5728: network card drivers, network protocol stacks, and hardware wiring are all suspected.

Summary: Narrowing down the positioning range, the fault is concentrated on the network port 0 transmission chain of the AM5728.

Step 4. Check the network card driver and network protocol stack

Change the board to ftp communication with AM5728 through network port 1(cpsw1).

Using the same ftp test, there was no problem in the continuous test for one day.

Because the network card driver and network protocol stack of network port 0 and network port 1 are basically the same, the hardware wiring of network port 0 is more suspicious.

Summary: If the scope is further narrowed, the trace signal is suspected, but there is still a lack of direct evidence.

Step 5. Check the wiring signal of network port 0

We plan to read the mac layer statistics of network port 0 to see its error statistics. If there are various errors caused by the quality of the trace signal, the error statistics should be seen here.

Because the native driver code does not support, we add the corresponding function:

diff --git a/vxbFdtTiCpswEnd.c b/vxbFdtTiCpswEnd.c
index e60c507..4c32f0e 100644 (file)
--- a/vxbFdtTiCpswEnd.c
+++ b/vxbFdtTiCpswEnd.c
@@ -256,6 +257,7 @@ LOCAL char * rxChanErrMsg [] =
 /* common resource which is shared by all port */
 
 LOCAL CPSW_SW_CTRL * pSwCtrl = NULL;
+CPSW_STAT           tmp_cpswStat; 
 
 /* import functions */
 

@@ -2872,6 +2874,70 @@ LOCAL STATUS cpswEndStatsDump
     return OK;
     }
 
+STATUS cpswEndStatsShow
+    (
+    )
+    {
+    int              i;
+    VXB_DEV_ID       pDev = pSwCtrl->pDev;
+       UINT32         * temp = (UINT32 *)&tmp_cpswStat;
+
+    /*
+     * hardware statistic counters are write-to-decrement,
+     * after a read, we write the value read to clear
+     * the counters
+     */
+
+    for (i = 0; i < sizeof (CPSW_STAT) / sizeof (UINT32); i++)
+        {
+        *temp = CSR_READ_4 (pDev, pSwCtrl->statsOffset + i*4);
+        //CSR_WRITE_4 (pDev, pSwCtrl->statsOffset + i*4, *temp);
+        temp++;
+        }
+               
+       (* _func_kprintf)("cpswMacStat rxgood = %d\n", tmp_cpswStat.rxgood);
+       (* _func_kprintf)("cpswMacStat rxbroadcast = %d\n", tmp_cpswStat.rxbroadcast);
+       (* _func_kprintf)("cpswMacStat rxmulticast = %d\n", tmp_cpswStat.rxmulticast);
+       (* _func_kprintf)("cpswMacStat rxpause = %d\n", tmp_cpswStat.rxpause);
+       (* _func_kprintf)("cpswMacStat rxcrcerros = %d\n", tmp_cpswStat.rxcrcerros);
+       (* _func_kprintf)("cpswMacStat rxalignmenterrors = %d\n", tmp_cpswStat.rxalignmenterrors);
+       (* _func_kprintf)("cpswMacStat rxoversized = %d\n", tmp_cpswStat.rxoversized);
+       (* _func_kprintf)("cpswMacStat rxjabber = %d\n", tmp_cpswStat.rxjabber);
+       (* _func_kprintf)("cpswMacStat rxundersized = %d\n", tmp_cpswStat.rxundersized);
+       (* _func_kprintf)("cpswMacStat rxfrags = %d\n", tmp_cpswStat.rxfrags);
+       (* _func_kprintf)("cpswMacStat rxoctets = %d\n\n", tmp_cpswStat.rxoctets);
+
+       (* _func_kprintf)("cpswMacStat txgood = %d\n", tmp_cpswStat.txgood);
+       (* _func_kprintf)("cpswMacStat txbroadcast = %d\n", tmp_cpswStat.txbroadcast);
+       (* _func_kprintf)("cpswMacStat txmulticast = %d\n", tmp_cpswStat.txmulticast);
+       (* _func_kprintf)("cpswMacStat txpause = %d\n", tmp_cpswStat.txpause);
+       (* _func_kprintf)("cpswMacStat txdefered = %d\n", tmp_cpswStat.txdefered);
+       (* _func_kprintf)("cpswMacStat txcollision = %d\n", tmp_cpswStat.txcollision);
+       (* _func_kprintf)("cpswMacStat txsinglecol = %d\n", tmp_cpswStat.txsinglecol);
+       (* _func_kprintf)("cpswMacStat txmulticol = %d\n", tmp_cpswStat.txmulticol);
+       (* _func_kprintf)("cpswMacStat txexceesive = %d\n", tmp_cpswStat.txexceesive);
+       (* _func_kprintf)("cpswMacStat txlatecol = %d\n", tmp_cpswStat.txlatecol);
+       (* _func_kprintf)("cpswMacStat txunderrun = %d\n", tmp_cpswStat.txunderrun);
+       (* _func_kprintf)("cpswMacStat txcariersense = %d\n", tmp_cpswStat.txcariersense);
+       (* _func_kprintf)("cpswMacStat txoctets = %d\n\n", tmp_cpswStat.txoctets);
+
+       (* _func_kprintf)("cpswMacStat sz64octets = %d\n", tmp_cpswStat.sz64octets);
+       (* _func_kprintf)("cpswMacStat sz65_127octets = %d\n", tmp_cpswStat.sz65_127octets);
+       (* _func_kprintf)("cpswMacStat sz128_255octets = %d\n", tmp_cpswStat.sz128_255octets);
+       (* _func_kprintf)("cpswMacStat sz256_511octets = %d\n", tmp_cpswStat.sz256_511octets);
+       (* _func_kprintf)("cpswMacStat sz512_1023octets = %d\n", tmp_cpswStat.sz512_1023octets);
+       (* _func_kprintf)("cpswMacStat sz1024octets = %d\n", tmp_cpswStat.sz1024octets);
+       (* _func_kprintf)("cpswMacStat netoctets = %d\n", tmp_cpswStat.netoctets);
+       (* _func_kprintf)("cpswMacStat rxfifooverrun = %d\n", tmp_cpswStat.rxfifooverrun[0]);
+       (* _func_kprintf)("cpswMacStat rxfifooverrun = %d\n", tmp_cpswStat.rxfifooverrun[1]);
+       (* _func_kprintf)("cpswMacStat rxfifooverrun = %d\n\n", tmp_cpswStat.rxfifooverrun[2]);
+
+       (* _func_kprintf)("cpswMacStat port 0 cpswInDropped = %lld\n", pSwCtrl->port[0]->cpswInDropped);
+       (* _func_kprintf)("cpswMacStat port 1 cpswInDropped = %lld\n\n\n", pSwCtrl->port[1]->cpswInDropped);
+
+    return OK;
+    }
+
 /*******************************************************************************
 *
 * cpswEndIoctl - the driver I/O control routine

In normal transmission, occasionally there is a case of receiving crc error:

-> cpswEndStatsShow
...
cpswEndStatsShow,2900, rxgood = 1023
cpswEndStatsShow,2901, rxbroadcast = 0
cpswEndStatsShow,2902, rxmulticast = 0
cpswEndStatsShow,2903, rxpause = 0
cpswEndStatsShow,2904, rxcrcerros = 17
cpswEndStatsShow,2905, rxalignmenterrors = 0
cpswEndStatsShow,2906, rxoversized = 0
cpswEndStatsShow,2907, rxjabber = 0
cpswEndStatsShow,2908, rxundersized = 0
cpswEndStatsShow,2909, rxfrags = 0
cpswEndStatsShow,2910, rxoctets = 83903

In the case of extreme lag, the received crc error increases greatly:

-> cpswEndStatsShow
...
cpswEndStatsShow,2900, rxgood = 5544
cpswEndStatsShow,2901, rxbroadcast = 181
cpswEndStatsShow,2902, rxmulticast = 158
cpswEndStatsShow,2903, rxpause = 0
cpswEndStatsShow,2904, rxcrcerros = 109
cpswEndStatsShow,2905, rxalignmenterrors = 0
cpswEndStatsShow,2906, rxoversized = 0
cpswEndStatsShow,2907, rxjabber = 0
cpswEndStatsShow,2908, rxundersized = 0
cpswEndStatsShow,2909, rxfrags = 0
cpswEndStatsShow,2910, rxoctets = 433598

Construct a packet with full FF content, send 100, and send 100 errors:

-> cpswEndStatsShow
...
cpswEndStatsShow,2900, rxgood = 9844
cpswEndStatsShow,2901, rxbroadcast = 613
cpswEndStatsShow,2902, rxmulticast = 379
cpswEndStatsShow,2903, rxpause = 0
cpswEndStatsShow,2904, rxcrcerros = 272
cpswEndStatsShow,2905, rxalignmenterrors = 0
cpswEndStatsShow,2906, rxoversized = 0
cpswEndStatsShow,2907, rxjabber = 0
cpswEndStatsShow,2908, rxundersized = 0
cpswEndStatsShow,2909, rxfrags = 0
cpswEndStatsShow,2910, rxoctets = 764175

-> cpswEndStatsShow
...
cpswEndStatsShow,2900, rxgood = 9877
cpswEndStatsShow,2901, rxbroadcast = 635
cpswEndStatsShow,2902, rxmulticast = 387
cpswEndStatsShow,2903, rxpause = 0
cpswEndStatsShow,2904, rxcrcerros = 374
cpswEndStatsShow,2905, rxalignmenterrors = 0
cpswEndStatsShow,2906, rxoversized = 0
cpswEndStatsShow,2907, rxjabber = 0
cpswEndStatsShow,2908, rxundersized = 0
cpswEndStatsShow,2909, rxfrags = 0
cpswEndStatsShow,2910, rxoctets = 766815

The same test, network port 1 tested for a day without crc error.

Summary: It is indeed a problem with the signal quality of the hardware wiring, which causes data errors during transmission.

Step 6. Network port 0 slowdown test attempt

We try to reduce the speed of network port 0 to 100M to test whether the speed reduction can eliminate the error.

Because the native driver does not have a similar function, we also have to write one manually:

diff --git a/vxbFdtTiCpswEnd.c b/vxbFdtTiCpswEnd.c
index 4c32f0e..71afa1a 100644 (file)
--- a/vxbFdtTiCpswEnd.c
+++ b/vxbFdtTiCpswEnd.c
@@ -2938,6 +2946,126 @@ STATUS cpswEndStatsShow
     return OK;
     }
 
+STATUS cpswEndLinkSet
+(
+       UINT32 port,
+       UINT32 mode
+)
+{
+       CPSW_DRV_CTRL * pDrvCtrl;
+       
+       if ((port > 1) || (mode > 4))
+               return -1;
+       
+       pDrvCtrl = pSwCtrl->port[port];
+       
+       switch(mode)
+       {
+               case 0:
+                       (void)miiBusModeSet (pDrvCtrl->cpswMiiDev, IFM_ETHER | IFM_AUTO);
+                       (* _func_kprintf)("set port %d mode: AUTO_SEL\n");
+                       break;
+               case 1:
+                       (void)miiBusModeSet (pDrvCtrl->cpswMiiDev, IFM_ETHER|IFM_100_TX|IFM_FDX);
+                       (* _func_kprintf)("set port %d mode: 100 FDX\n");
+                       break;  
+               case 2:
+                       (void)miiBusModeSet (pDrvCtrl->cpswMiiDev, IFM_ETHER|IFM_100_TX);
+                       (* _func_kprintf)("set port %d mode: 100 HDX\n");
+                       break;
+               case 3:
+                       (void)miiBusModeSet (pDrvCtrl->cpswMiiDev, IFM_ETHER|IFM_10_T|IFM_FDX);
+                       (* _func_kprintf)("set port %d mode: 10 FDX\n");
+                       break;
+               case 4:
+                       (void)miiBusModeSet (pDrvCtrl->cpswMiiDev, IFM_ETHER|IFM_10_T);
+                       (* _func_kprintf)("set port %d mode: 10 HDX\n");
+                       break;
+               default:
+                       break;                  
+       }
+       
+       return OK;
+}

Forcibly set network port 0 to 100M full-duplex mode:

-> cpswEndLinkSet(0, 1)

The network port 0 is slowed down to 100M mode, and there is no crc error after testing for a long time.

Summary: Slowing down can solve the problem of crc error and is a workaround.

3. Conclusion

To sum up, the root cause of ftp slowdown is a large number of wrong packets and packet loss caused by unstable hardware signals.

The current workaround solution can reduce the speed of network port 0 to 100M for use.

The fundamental solution requires hardware re-wiring and re-simulation. After the single board is completed, a complete signal test needs to be done, so that the high-speed signal can have a solid and stable foundation.

Tags: crc vxworks

Posted by james_creasy on Mon, 09 May 2022 16:29:26 +0300