SB dumps data at ~20MB/s [debugging]

Home Forums Users Discussion SB dumps data at ~20MB/s [debugging]

This topic contains 9 replies, has 2 voices, and was last updated by  Dale 7 months, 2 weeks ago.

Viewing 10 posts - 1 through 10 (of 10 total)
  • Author
    Posts
  • #883

    Dale
    Participant

    This is an update to issue first noted here. It concerns our support board (SB) acquiring non-representative data at ~20MB/s (via QuickUSB).

    For context and background, we have one fully operational OpenPET crate (Support Board) after implementing the fix discussed by Roger here via our custom DB firmware. Note that this operational board was manufactured post-Jan 2016. Our other support board seemed to have a different issue after our move from Chicago (where it ran successfully) to Philadelphia. However, we placed aside the original crate and its support board in favor of debugging the slot 3 & 7 timing issue above.

    We tested this Support Board for a full week and did not have an issue. This was with different OpenPET versions, DBs, slots number, firmware/DB flash, and signal input. Alas, after all this testing, the problem returned today after a simple DB exchange. In its simplest form, we get the issue when trying to run a simple 16-channel singles more run:

    $ openpet -c 3 0x8002 2
    2017-02-07 11:59:56,423 INFO [S] 0x0003 0x8002 0x00000002
    2017-02-07 11:59:56,625 INFO [R] 0x8003 0x8002 0x00000002
    
    $ openpet -c 7 0x8002 2
    2017-02-07 12:00:05,108 INFO [S] 0x0007 0x8002 0x00000002
    2017-02-07 12:00:05,311 INFO [R] 0x8007 0x8002 0x00000002
    
    $ openpet -a 5 -o def_singles.dat
    2017-02-07 12:00:31,427 INFO Starting qusb data acquisition...
    2017-02-07 12:00:35,848 INFO 0.587s remaining.
    2017-02-07 12:00:36,446 INFO QUSB rate is 45.319 MB/s
    2017-02-07 12:00:36,447 INFO Stopping qusb data stream...
    2017-02-07 12:00:36,450 INFO QUSB data stream has stopped.
    2017-02-07 12:00:36,450 INFO Stopping qusb data queue [0]...
    2017-02-07 12:00:36,453 INFO QUSB data queue has stopped.
    2017-02-07 12:00:36,453 INFO 19.933 MB/s written to disk, def_singles.dat
    
    $ ls -lhrt def_singles.dat
    -rwx------+ 1 CSPECT mkpasswd 101M Feb  7 12:00 def_singles.dat 
    

    In the above example, we had a single DB in slot #2 and we are using OpenPET v2.1. The (4kB) header appears correct but the data output for (nearly) all 32-bit words is

    0000 0000 0000 0000 0000 0000 0000 0110 = 6 [dec]
    

    Other runs would give 0100 = 4 [dec] in the four LSBs of each word. Note that for some runs there would occasionally be a correctly formatted word but rarely would a full event be recorded.

    Additional tests showed the problem occurred in different DB slots and for different DBs. More over, the problem persists even when there is no signal input (where we expect just to get the 4kB header).

    Acquiring data with our custom firmware was consistent with the above where (nearly) every word would be replaced by 0110 [6] or 0100 [4]. However, we could observe the data stream being correct via Signal Tap with the USB-blaster to the DB JTAG connection. We activated a static pattern pulser via our custom firmware, and we did observe the rare data word (e.g. 0x1FEEEEEA) that was consistent with our input pattern but that occurred out of sync and mostly overwritten by the [6] and [4] output.

    Has this issue be observed before? How would you suggest we proceed in debugging it further?

    #887

    Faisal
    Keymaster

    Hi Dale,

    We have not observed this issue before.
    Let me confirm few things first, you have two Support Boards (SB)s. An old SB and a new SB (post 2016 fab). The main suspect here is the old SB. Is that correct? Also, the new SB is fully functional using all DBs, is that correct? Do you have a HostPC board?

    Regarding the suspicious SB we will start with simple diagnostics:
    0- Visual inspections of LEDs:
    A- If you have a HostPC board all LEDS should be lit. See http://openpet.lbl.gov/img_4693/
    B- If you don’t have a HostPC, compare both new and old SB leds on the back. Do they look the same? Do they look like this http://openpet.lbl.gov/img_4620/ ?

    These LEDs confirm that the voltages on the board are within spec.

    1- JTAG functionality:
    A- Using default SB jumpers does Altera Quartus Programmer recognize the Main FPGA?
    B- Look at https://openpet-developers-guide.readthedocs.io/en/latest/system_troubleshooting.html?highlight=jtag#system-troubleshooting
    Configure your SB jumpers are shown in figures 68 and 67. Does Quartus programmer recognize all three fpgas? Main, IO1, and IO2?

    2- Confirm that your QuickUSB module is flashed to the latest version and functioning correctly. Swapping the two modules you have. Please wear ESD friendly attire 🙂

    A final note: when testing singles mode please use version v2.3.1 use $ openpet -c 0x0010 0x0800 0 to verify the version

    Please report back here and we will provide more suggestions later on.

    Faisal

    #888

    Dale
    Participant

    Thanks Faisal,

    The main suspect here is the old SB. Is that correct? Also, the new SB is fully functional using all DBs, is that correct? Do you have a HostPC board?

    Yes, the suspect board is the pre-Jan2016 version that we have had since 2014. Yes, fully functional on the newer SB and we have 14 fully tested DBs. No, we do not have a HostPC board; we just have the Quick-USB mounted on the SB for PC communication/data transfer.

    0-B:
    Our setup shows the same LEDs modulo, an additional LED on DN 13 (which relates to the QuickUSB). See here for photo of SB LEDs. The jumpers (J10 & J19) are also highlighted in the photo and consistent with the correct setup and our other SB.

    1-A:
    We appear to be able to use the JTAG to program without issue. Simple communication:

    $ jtagconfig -n
    1) USB-Blaster [USB-0]
      020F40DD   EP3C40/EP4CE(30|40)
        Node 0C006E00  JTAG UART #0
        Node 19104600  Nios II #0
        Design hash    AF3C215664A671B7FEC6
    
    $ nios2-terminal
    nios2-terminal: connected to hardware target using JTAG UART on cable
    nios2-terminal: "USB-Blaster [USB-0]", device 1, instance 0
    nios2-terminal: (Use the IDE stop button or Ctrl-C to terminate)
    
    OpenPET LBNL SupportBoard-Main FPGA
    Loading children FPGAs with bitstream stored on EPCS
    QUSB Waiting
    
    nios2-terminal: exiting due to ^C on host

    Loading OpenPET v2.3.1 flash for SB

    $ pwd
    /cygdrive/c/OpenPET/v2.3.1/supportboard
    
    $ ./flashBoard.sh
    [Default] CDUC firmware and software will be flashed.
    [OpenPET] Was your Support Board manufactured after January 2016? [y/n] n
    [OpenPET] Using default USB cable [1].
    [OpenPET] Running "quartus_pgm -c 1 -m jtag -o ipv;./bin/CDUC/sb64.jic"
    Info: *******************************************************************
    Info: Running Quartus II 32-bit Programmer
        Info: Version 13.1.0 Build 162 10/23/2013 SJ Full Version
        Info: Copyright (C) 1991-2013 Altera Corporation. All rights reserved.
        Info: Your use of Altera Corporation's design tools, logic functions
        Info: and other software and tools, and its AMPP partner logic
        Info: functions, and any output files from any of the foregoing
        Info: (including device programming or simulation files), and any
        Info: associated documentation or information are expressly subject
        Info: to the terms and conditions of the Altera Program License
        Info: Subscription Agreement, Altera MegaCore Function License
        Info: Agreement, or other applicable license agreement, including,
        Info: without limitation, that your use is for the sole purpose of
        Info: programming logic devices manufactured by Altera and sold by
        Info: Altera or its authorized distributors.  Please refer to the
        Info: applicable agreement for further details.
        Info: Processing started: Wed Feb 08 11:00:34 2017
    Info: Command: quartus_pgm -c 1 -m jtag -o ipv;./bin/CDUC/sb64.jic
    Info (213045): Using programming cable "USB-Blaster [USB-0]"
    Info (213011): Using programming file ./bin/CDUC/sb64.jic with checksum 0x497888
    8A for device EP3C40@1
    Info (209060): Started Programmer operation at Wed Feb 08 11:00:35 2017
    Info (209016): Configuring device index 1
    Info (209017): Device 1 contains JTAG ID code 0x020F40DD
    Info (209007): Configuration succeeded -- 1 device(s) configured
    Info (209018): Device 1 silicon ID is 0x16
    Info (209044): Erasing ASP configuration device(s)
    Info (209023): Programming device(s)
    Info (209021): Performing CRC verification on device(s)
    Info (209011): Successfully performed operation(s)
    Info (209061): Ended Programmer operation at Wed Feb 08 11:01:59 2017
    Info: Quartus II 32-bit Programmer was successful. 0 errors, 0 warnings
        Info: Peak virtual memory: 187 megabytes
        Info: Processing ended: Wed Feb 08 11:01:59 2017
        Info: Elapsed time: 00:01:25
        Info: Total CPU time (on all processors): 00:00:04
    [OpenPET] Done programming board. Please reboot chassis.
    [OpenPET] Press [Enter] to close this window

    1-B:
    I have not tested this yet. What do you do in the cases where you want to jump across pins? For example, from the Main TDO to IO1 TDI (i.e. J62 to J69). I have the jumper/shunts to do the 8-pairs of pins but how do I physically handle the blue & green lines on figure 68?

    2:
    Quick-USB driver correctly set to v2.15.2 (see below). Note that this PC, USB cables, etc. are all common in our test between the two SB. Before we swap the Quick-USB, is there any other testes we can run to confirm the interface between the output of the SB and Quick-USB card is working correctly? The limited diagnostic options provided by Bitwise shows a fully working/communicating module and otherwise (modulo serial number) identical to the setup in the other crate/SB. Note that in previous tests I have swapped the Quick-USB modules on the SBs and the problem stayed on this older/original SB (independent of swap).

    Test Command:
    Using OpenPET v2.3.1 I tried the suggested command (verbatim option turned on):

    $ openpet -v -c 0x0010 0x0800 0
    2017-02-08 13:53:39,153 DEBUG OpenPET v2.3.1
    2017-02-08 13:53:39,157 DEBUG Found QUSB-0
    2017-02-08 13:53:39,158 DEBUG QUSB DLL Version: v2.15.2
    2017-02-08 13:53:39,160 DEBUG QUSB Driver Version: v2.15.2
    2017-02-08 13:53:39,161 DEBUG QUSB Firmware Version: v2.15.2
    2017-02-08 13:53:39,163 DEBUG QUSB Writing to register(s):
    2017-02-08 13:53:39,164 DEBUG   [1]=0x0001
    2017-02-08 13:53:39,167 DEBUG   [2]=0xc000
    2017-02-08 13:53:39,168 DEBUG   [3]=0x0002
    2017-02-08 13:53:39,171 DEBUG   [5]=0x8010
    2017-02-08 13:53:39,177 INFO [S] 0x0010 0x0800 0x00000000
    2017-02-08 13:53:39,177 DEBUG Sending:
    2017-02-08 13:53:39,178 DEBUG ID 0x0010
    2017-02-08 13:53:39,180 DEBUG SRC 0x4000
    2017-02-08 13:53:39,180 DEBUG DST 0x0800
    2017-02-08 13:53:39,180 DEBUG PAYLOAD 0x00000000
    2017-02-08 13:53:39,384 DEBUG Received [retries 1/20]:
    2017-02-08 13:53:39,387 DEBUG ID 0x8010:
    2017-02-08 13:53:39,388 DEBUG SRC 0x0800:
    2017-02-08 13:53:39,391 DEBUG DST 0x4000:
    2017-02-08 13:53:39,394 DEBUG PAYLOAD 0x00000004:
    2017-02-08 13:53:39,397 INFO [R] 0x8010 0x0800 0x00000004
    2017-02-08 13:53:39,398 DEBUG Done.
    • This reply was modified 7 months, 2 weeks ago by  Dale. Reason: formating/code block edit
    #892

    Faisal
    Keymaster

    1-B- Jumper Cable Wire female-female. The shorter the cable the better.
    I typed the wrong command for the version. Please try this:
    openpet -v -c 0x0011 0x0800 0

    We need to verify that the IO FPGAs are OK. Once you confirm that jtagconfig can see them, then do the following:
    0- Connect JTAG to SB with default jumper configration
    1- Download http://openpet.lbl.gov/wp-content/uploads/2017/02/debug_img.zip
    2- unzip and note directory
    3- Open Quartus -> Open-> dropdown menu Files of Types: select “All files (*.*)” -> open sng_rnd.stp
    4- Program the device using the sof image in zip
    5- Count to 10
    6- Highlight the instance then run analysis.
    7- Acquire data as usual using the commands you pasted below. Signal tap should trigger. Send a screenshot of the waveforms. If it doesn’t trigger we need to debug the IO FPGAs.

    Other questions to answers:
    A- Is the SB PCB arched or bowed?
    B- Are the main power cables securely screwed?
    C- Basically, do a visual inspections of screws penetrating the PCB, blown capacitors, darker than usual soldermask, etc.

    Finally your jumper configuration look OK

    • This reply was modified 7 months, 2 weeks ago by  Faisal.
    #894

    Dale
    Participant

    Quick Reply on the 0011 command:

    $ openpet -v -c 0x0011 0x0800 0
    2017-02-08 15:35:54,602 DEBUG OpenPET v2.3.1
    2017-02-08 15:35:54,609 DEBUG Found QUSB-0
    2017-02-08 15:35:54,614 DEBUG QUSB DLL Version: v2.15.2
    2017-02-08 15:35:54,615 DEBUG QUSB Driver Version: v2.15.2
    2017-02-08 15:35:54,617 DEBUG QUSB Firmware Version: v2.15.2
    2017-02-08 15:35:54,618 DEBUG QUSB Writing to register(s):
    2017-02-08 15:35:54,621 DEBUG   [1]=0x0001
    2017-02-08 15:35:54,622 DEBUG   [2]=0xc000
    2017-02-08 15:35:54,625 DEBUG   [3]=0x0002
    2017-02-08 15:35:54,628 DEBUG   [5]=0x8010
    2017-02-08 15:35:54,635 INFO [S] 0x0011 0x0800 0x00000000
    2017-02-08 15:35:54,637 DEBUG Sending:
    2017-02-08 15:35:54,638 DEBUG ID 0x0011
    2017-02-08 15:35:54,638 DEBUG SRC 0x4000
    2017-02-08 15:35:54,640 DEBUG DST 0x0800
    2017-02-08 15:35:54,641 DEBUG PAYLOAD 0x00000000
    2017-02-08 15:35:54,845 DEBUG Received [retries 1/20]:
    2017-02-08 15:35:54,848 DEBUG ID 0x8011:
    2017-02-08 15:35:54,849 DEBUG SRC 0x0800:
    2017-02-08 15:35:54,851 DEBUG DST 0x4000:
    2017-02-08 15:35:54,854 DEBUG PAYLOAD 0x00002031:
    2017-02-08 15:35:54,855 INFO [R] 0x8011 0x0800 0x00002031
    2017-02-08 15:35:54,858 DEBUG Done.
    #895

    Dale
    Participant

    1-B:

    Here is the result of the jtagconfig command using the JTAG jumper-pin chain for debugging on the SB.

    $ jtagconfig -n
    1) USB-Blaster [USB-0]
      020F40DD   EP3C40/EP4CE(30|40)
        Node 0C006E00  JTAG UART #0
        Node 19104600  Nios II #0
        Design hash    F0FBFE7DE08E4246C75C
      020F40DD   EP3C40/EP4CE(30|40)
        Node 19104600  Nios II #0
        Node 0C006E00  JTAG UART #0
        Design hash    45736B774D4EBB3D57D1
      020F40DD   EP3C40/EP4CE(30|40)
        Node 19104600  Nios II #0
        Node 0C006E00  JTAG UART #0
        Design hash    45736B774D4EBB3D57D1

    We will transition back to the default SB (green) jumper setup and follow up with your suggested test above.

    #896

    Dale
    Participant

    Hi Faisal,

    I cannot seem to progress to take singles data at step #7. Programing the SB appears to prevent me from running the normal OpenPET commands. For example:

    $ openpet -v -c 1 0x8002 0xABCD0123
    2017-02-09 14:49:56,233 DEBUG OpenPET v2.3.1
    2017-02-09 14:49:56,237 DEBUG Found QUSB-0
    2017-02-09 14:49:56,240 DEBUG QUSB DLL Version: v2.15.2
    2017-02-09 14:49:56,240 DEBUG QUSB Driver Version: v2.15.2
    2017-02-09 14:49:56,240 DEBUG QUSB Firmware Version: v2.15.2
    2017-02-09 14:49:56,240 DEBUG QUSB Writing to register(s):
    2017-02-09 14:49:56,242 DEBUG   [1]=0x0001
    2017-02-09 14:49:56,243 DEBUG   [2]=0xc000
    2017-02-09 14:49:56,243 DEBUG   [3]=0x0002
    2017-02-09 14:49:56,244 DEBUG   [5]=0x8010
    2017-02-09 14:49:56,250 INFO [S] 0x0001 0x8002 0xABCD0123
    2017-02-09 14:49:56,250 DEBUG Sending:
    2017-02-09 14:49:56,250 DEBUG ID 0x0001
    2017-02-09 14:49:56,252 DEBUG SRC 0x4000
    2017-02-09 14:49:56,252 DEBUG DST 0x8002
    2017-02-09 14:49:56,253 DEBUG PAYLOAD 0xABCD0123
    2017-02-09 14:50:00,255 WARNING Controller Unit is not responding. Try again or restart.
    2017-02-09 14:50:00,259 WARNING Controller Unit is replying with zeros.
    2017-02-09 14:50:00,260 DEBUG Received [retries 20/20]:
    2017-02-09 14:50:00,265 DEBUG ID 0x0000:
    2017-02-09 14:50:00,266 DEBUG SRC 0x0000:
    2017-02-09 14:50:00,269 DEBUG DST 0x0000:
    2017-02-09 14:50:00,270 DEBUG PAYLOAD 0x00000000:
    2017-02-09 14:50:00,273 INFO [R] 0x0000 0x0000 0x00000000
    2017-02-09 14:50:00,275 DEBUG Done.

    Prior to reprogramming, we do not have this issue:

    $ openpet -c 1 0x8002 0xABCD0123
    2017-02-09 14:29:45,680 INFO [S] 0x0001 0x8002 0xABCD0123
    2017-02-09 14:29:45,882 INFO [R] 0x8001 0x8002 0xABCD0123
    
    $ openpet -c 3 0x8002 2
    2017-02-09 14:29:54,727 INFO [S] 0x0003 0x8002 0x00000002
    2017-02-09 14:29:54,930 INFO [R] 0x8003 0x8002 0x00000002
    
    $ openpet -c 7 0x8002 2
    2017-02-09 14:30:00,953 INFO [S] 0x0007 0x8002 0x00000002
    2017-02-09 14:30:01,157 INFO [R] 0x8007 0x8002 0x00000002
    
    $ openpet -a 6 -o pre_LBL_test2.dat
    2017-02-09 14:31:23,793 INFO Starting qusb data acquisition...
    2s remaining.2017-02-09 14:31:29,815 INFO QUSB rate is 45.303 MB/s
    2017-02-09 14:31:29,816 INFO Stopping qusb data stream...
    2017-02-09 14:31:29,819 INFO QUSB data stream has stopped.
    2017-02-09 14:31:29,819 INFO Stopping qusb data queue [0]...
    2017-02-09 14:31:29,821 INFO QUSB data queue has stopped.
    2017-02-09 14:31:29,822 INFO 0.0 MB/s written to disk, pre_LBL_test2.dat

    The programing itself does not seem to be an issue; it shows 100% success:

    Info (209060): Started Programmer operation at Thu Feb 09 14:45:33 2017
    Info (209016): Configuring device index 1
    Info (209017): Device 1 contains JTAG ID code 0x020F40DD
    Info (209007): Configuration succeeded -- 1 device(s) configured
    Info (209011): Successfully performed operation(s)
    Info (209061): Ended Programmer operation at Thu Feb 09 14:45:35 2017

    What am I missing here?

    #897

    Dale
    Participant

    Alas, the SB went back to fully working yesterday and I have not been able to perturb the system back to the error mode. I will make a few adjustments to see if I can get the problem to occur for debugging. I will repeat the 1-B test if I am able.

    I have uploaded a photo album of the pictures taken so far of the SB: see SB inspection photos.

    With respect to your inspection questions:

    A – The board is not warped or bowed

    B – Main power cables are firmly connected. Direct measurement of voltages are correct and stable and the DB LEDs also all light up correctly when in.

    C – No blown capacitors that I can see. Likewise, no issues with screws or with the board being damaged or stressed. As shown in the linked photos, there is a little bit of coloration around the Main FPGA. I cleaned it and removed dust elsewhere using a fine paint-brush cleaned on an alcohol wipe. This seemed to improved its appearance so it may have been a bit of “stain” on the board there.

    #898

    Faisal
    Keymaster

    I am not sure whats causing the issue yet. Please let us know once it goes bad again.

    • This reply was modified 7 months, 2 weeks ago by  Faisal.
    #899

    Dale
    Participant

    Will do.

    So far we haven’t gotten the system to go bad despite many power-cycles and tests, etc.

    • This reply was modified 7 months, 2 weeks ago by  Dale.
Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic.