U-verse down every 1.5 - 2 hours

Employee

U-verse down every 1.5 - 2 hours

About every 1.5 to 2 hours, give or take 10 minutes, my U-verse RG, a Pace 2wire 3801HGV, locks up or freezes up completely, and never reconnects (longest I could wait was 32 hours). I only have internet and tv service through U-verse, but both services go completely unavailable after that time frame. To return to service, I have to pull the power cable out of the RG, wait 10 seconds, then plug it back in and wait for it to sync up. Please see below for detailed troubleshooting data.

 

When this problem started 4 weeks ago, I had my original 2wire 3800HGV-B that has worked flawlessly for 4 years. The first field service tech tested outside and inside wiring, found no errors and replaced the 3800HGV-B  with a (now Pace) 2wire 3801HGV, which was either refurbished or gently used. The result was the same problem every 1.5 to 2 hours. The second field service tech repeated the first, but replaced the 3801HGV with a brand new 3801HGV. The result is the same problem same problem every 1.5 to 2 hours. The third field service tech didn't bother replacing the RG, but blamed my network for doing something to make the RG unavailable. So, my next test case was to see if the RG failed within two hours with nothing connected to the Ethernet . I disconnected my switch from the RG such that the only thing connected to the RG was coax (to the IPTV set top box), DSL, and power. In 1 hour 38 minutes, the RG failed again.

 

What should I try next?

 

Below  are some data I collected to help troubleshoot.

 

Case of:  RG in working state

1) LED status

   a) power = solid green

   b) ethernet = blinking green

   c) wireless = blinking green

   d) homepna = blinking green

   e) phone 1 & 2 = off (not ordered)

   f) broadband = solid green

   g) service = solid green

 

2) TV/DVR status

   a) all services available

 

3) Ethernet status

   a) normal traffic  from RG seen by tcpdump or wireshark

 

4) Wireless status

   a) SSID being broadcast, auto-auth works

 

 

Case of: RG in non-working state

1) LED status

   a) power = solid green

   b) ethernet = blinking green

   c) wireless = off

   d) homepna = blinking green

   e) phone 1 & 2 = off (not ordered)

   f) broadband = solid green

   g) service = solid green

 

2) TV/DVR status

   a) error "cannot connect to IPTV"

 

3) Ethernet status

   a) only ARP traffic seen from RG by tcpdump and wireshark, no response to icmp echo or bootpc. the RG is broadcasting ARP requests for the only host on the network (running wireshark). The host responds to the MAC of the RG, 60:c3:97:97:41:69.

 

4) Wireless status

   a) SSID being broadcast, but no authentication. Connections timeout after WPA2-PSK entered

 

When my switch was connected, it showed the following when the RG went dead (note that the interface is still up/up):

switch#ping 192.168.1.254

 

Type escape sequence to abort.

Sending 5, 100-byte ICMP Echos to 192.168.1.254, timeout is 2 seconds:

.....

Success rate is 0 percent (0/5)

switch#sh int gi0/1

GigabitEthernet0/1 is up, line protocol is up

  Hardware is MV96340 Ethernet, address is 0013.c4e5.aec1 (bia 0013.c4e5.aec1)

  Internet address is 192.168.1.71/24

  MTU 1500 bytes, BW 100000 Kbit/sec, DLY 100 usec,

     reliability 255/255, txload 1/255, rxload 1/255

  Encapsulation ARPA, loopback not set

  Keepalive set (10 sec)

  Full-duplex, 100Mb/s, media type is T

  output flow-control is XON, input flow-control is XON

  ARP type: ARPA, ARP Timeout 04:00:00

  Last input 00:00:00, output 00:00:02, output hang never

  Last clearing of "show interface" counters never

  Input queue: 0/75/280/0 (size/max/drops/flushes); Total output drops: 0

  Queueing strategy: fifo

  Output queue: 0/40 (size/max)

  5 minute input rate 0 bits/sec, 0 packets/sec

  5 minute output rate 0 bits/sec, 0 packets/sec

     4627730 packets input, 454824757 bytes, 0 no buffer

     Received 3721985 broadcasts, 0 runts, 0 giants, 4 throttles

     424 input errors, 0 CRC, 0 frame, 0 overrun, 424 ignored

     0 watchdog, 0 multicast, 0 pause input

     0 input packets with dribble condition detected

     382182 packets output, 92449948 bytes, 0 underruns

     0 output errors, 0 collisions, 1 interface resets

     0 unknown protocol drops

     0 babbles, 0 late collision, 0 deferred

     0 lost carrier, 0 no carrier, 0 pause output

     0 output buffer failures, 0 output buffers swapped out

switch#

Current RG system information:

 

Manufacturer    Pace Plc

Model   3801HGV

Serial Number   281319049214

Hardware Version           000778-004

Software Version             6.9.1.42-enh.tm

Key Code             [redacted]

First Use Date    October 8, 2013

Current Date & Time      Monday, October 14, 2013

4:09:04 PM

Central Daylight Time

Time Since Last Boot       0 day 1:17:30

DSL Modem       8.4.C

System Password            [redacted]

System Software Components

 

att-lightspeed   162588

att-lightspeed-fw-1_ippolicy      159800

att-lightspeed_cms-certs             160219

att-lightspeed_eapol-certs          159801

att-lightspeed_en           160725

att-lightspeed_providerconf      160186

att-unified-lightspeed_config    162551

base_voice         159364

common_en      160676

common_ui        162552

system 160199

Firewall Rules    1000

Application List  1001

 

Also, I've included a screenshot of my UV Realtime during the 1.5 to 2 hours of uptime:

 

 

Thanks for any help!

Employee Contributor*
*I am an AT&T employee and the postings on this site are my own and don't necessarily represent AT&T's position, strategies or opinions.
Message 1 of 16 (1,016 Views)
ACE - Expert

Re: U-verse down every 1.5 - 2 hours

[ Edited ]

noguanoguy - It is good that you eliminated the switch by disconnecting it. Can you do the same for the HPNA by disconnecting the COAX and rebooting? Can you connect a TV by cat5 instead of the coax. It can be further complicated if the NID to RG is COAX. You would have a diplexer. How much coax do you have? Any splitters? A fault anywhere in the COAX cabling could cause our problem.

*The views and opinions expressed on this forum are purely my own. Any product claim, statistic, quote, or other representation about a product or service should be verified with the manufacturer, provider, or party.
Message 2 of 16 (1,014 Views)
ACE - Expert

Re: U-verse down every 1.5 - 2 hours

Would be really helpful to see the bitloading graph and the errors tab if you can get them after a few minutes of operation.

 

Since you've replaced the device 3 times now (if I counted right), it certainly seems like there is something on your line, home or power that's causing this issue.  We'll do our best to help you track it down.

 

*The views and opinions expressed on this forum are purely my own. Any product claim, statistic, quote, or other representation about a product or service should be verified with the manufacturer, provider, or party.
Message 3 of 16 (1,002 Views)
Employee

Re: U-verse down every 1.5 - 2 hours

I disconnected the HPNA coax, such that the only cable connected to the RG was the DSL cable. The RG hung between 1:30:00 and 1:42:00. I could only check via the wireless interface, so that's as close as I could get to the time it froze. In addition, all field service techs used their coax testers and got clean signal from RG to the set top box.

 

Unfortunately, I can't connect via cat 5 or 6 because the devices are separated by at least 40 feet from one end of my house to the other, and powerline Ethernet can only deliver SD quality video at about 10 Mbps. Ethernet over coax was my last option, without running all new inside wiring for cat 6. However, I haven't tested any new powerline Ethernet devices since IEEE 1901 was ratified. There is approximately 50 feet of coax, and no splitters or couplers.

 

I've attached the bitloading and errors tabs from UV Realtime. One note from these screenshots is that I have to connect one Ethernet host running Windows to the RG. When I stopped in to check on that host close to freeze time, I noticed the "corrected blocks" error count had gone up to 434. The next time I checked on it, the RG had frozen and apparently no data is saved from UV Realtime. It would be great if I could get a syslog fead to import into splunk. At any rate, I'm grateful for the assistance and eager to resolve this problem.

 

Employee Contributor*
*I am an AT&T employee and the postings on this site are my own and don't necessarily represent AT&T's position, strategies or opinions.
Message 4 of 16 (955 Views)
ACE - Expert

Re: U-verse down every 1.5 - 2 hours

noguanoguy - My only concern is that the HPNA was deactivated when you removed the coax. Did you reboot the RG? Power down - remove coax - repower. If the HPNA is still active, the open connector could cause problems.

It is sounding like the problem is the RG or upstream. Maybe someone who knows realtime can see something.

If it were me I would get a 50' cat5 from monoprice.com (actually I have one in the storage bin downstairs) & run it down the hall &reboot just to see what happens.
*The views and opinions expressed on this forum are purely my own. Any product claim, statistic, quote, or other representation about a product or service should be verified with the manufacturer, provider, or party.
Message 5 of 16 (946 Views)
Expert

Re: U-verse down every 1.5 - 2 hours

I'm thinking bad RG. I've seen/heard of these units getting slow or sometimes having configuration issues requiring a factory reset, but never have I heard of one locking up before.
Message 6 of 16 (944 Views)
ACE - Expert

Re: U-verse down every 1.5 - 2 hours

I think I remember somebody (like TexasGuy or someone like that), having an issue where his RG locked up at predictable intervals.  I don't remember the solution, though.

 

*The views and opinions expressed on this forum are purely my own. Any product claim, statistic, quote, or other representation about a product or service should be verified with the manufacturer, provider, or party.
Message 7 of 16 (937 Views)
Employee

Re: U-verse down every 1.5 - 2 hours

aviewer - I pulled the power on the RG before unscrewing the coax, and once it was removed I reconnected power. As stated above, that test resulted in the same freeze up behavior.

 

SomeJoe7777 - that's what I thought at first, but this is my third RG, and second 3801HGV. Since I factory reset the RG, the only error in the system log is:

 

ERR

P0000-00-00T00:00:15

pkgd: cm_tran_begin get thread context thrSmiley Tonguekgd - each thread is allowed only a single transaction

 

JetterMC - texasguy37 had a very similar issue with a 3800HGV-B, mine is a 3801HGV, which is the same model that solved his problem. At this point I'm not ruling out the possibility of a firmware bug since I have updated to 6.9.1.42-enh.tm on the 3801HGV.

 

I had an idea last night to just run the gateway with nothing but power connected to it, and see what happens. So, I pulled the power, disconnected the DSL and Ethernet cables, then plugged the RG back in. I monitored via wi-fi and the RG stayed reachable for over 16 hours, way past the normal 1:36. Then I powered it down again, installed the DSL RJ11, and returned to normal service. In 1:36, the RG went unreachable again. Right before it went unreachable, I saw corrected blocks jump up to 4868, with one uncorrectable, but the app hung and reset before I could get a new screenshot.  Therefore, I am currently thinking the problem exists between inside wiring and VRAD.

 

One more test I wanted to run was a factory reset after the last failure. I have done that and am now 1:03 into the current session.

 

Thanks for all the advice, it really encourages me to continue tinkering with this issue in hopes of reaching a solution.

 

Below are the most recent screenshots of the case where I disconnected everything but the power:

Employee Contributor*
*I am an AT&T employee and the postings on this site are my own and don't necessarily represent AT&T's position, strategies or opinions.
Message 8 of 16 (906 Views)
Community Support

Re: U-verse down every 1.5 - 2 hours

Hi noguanoguy,

 

I definitely apologize about all the inconveniences. I can only think it is what everyone else is thinking and that it is possibly a bad RG. The interesting thing I noticed from your post is that it even has trouble with an internal connection with you trying to access the gateway from http://192.168.1.254. 

One thing you want to check for is your NAT sessions. You just want to login to your router and go to Settings -> Diagnostics -> NAT.

From there you will see the session table and if it is 500/1024 or lower, you want to start reading the session table to see what is taking everything up.

I know when I hit 0/1024, I can no longer log into my router, but my internet still works, but sometimes it is horribly slow or I get a page load error.

 

If everything looks good there, the only other thing I can think of is possibly the power outlet. There may be some kind of electrical discharge that is happening every 1.5-2 hours causing the router to freeze up. If possible, I would try to move it to a different location and see if that helps.

 

Let us know how it goes,

David T

If you encounter any issues with your service or equipment, I recommend checking out our Troubleshoot & Resolve solutions to help diagnose the issue.
Employee Contributor*
*I am an AT&T employee and the postings on this site are my own and don't necessarily represent AT&T's position, strategies or opinions.
Message 9 of 16 (897 Views)
Employee

Re: U-verse down every 1.5 - 2 hours

Hey DavidCS,

 

Thanks for looking into this problem. I will definitely have a look at the NAT table the next available time. You are correct on trying to access the RG from layer 4 and 3, via http://192.168.1.254. Although layers 4 & 3 became unreachable after an hour and 36 minutes, I still see layer 2 frames on the wire with the source MAC of the RG, but the RG doesn’t respond to bootpc broadcasts. Like two ships passing in the night. That’s what originally led me to suggest replacing the RG. Three RG’s later, I’m not so sure anymore. I really wish the 3801HGV had a console/serial port I could use for out of band access.

 

Another interesting tidbit is that with no cables plugged into the RG except power, it stayed up/reachable for at least 16 hours, and probably would have stayed up for more. I was able to log into http://192.168.1.254 and check logs via wi-fi the entire time. It was only when I plugged DSL back in that it went unreachable in 1:36.

 

I also like your idea of moving the RG power to another power strip. I have it in a switched PDU currently, but will move it over to a power strip in another outlet and test the next chance I get. Moving it any further is going to be problematic, unless I move it outside to the NID. Family is visiting for the next 5 days so my U-verse test time is going to slow to a crawl from the relative frantic pace over the past month.

 

To close the loop on my earlier post, the factory reset did not resolve the problem. The RG went unavailable in 1:36.

 

I’ll let you, and the community, know how the next round of testing goes.

Employee Contributor*
*I am an AT&T employee and the postings on this site are my own and don't necessarily represent AT&T's position, strategies or opinions.
Message 10 of 16 (882 Views)
Community Support

Re: U-verse down every 1.5 - 2 hours

Thank you for all that information. I found this to be very interesting as well.

 


noguanoguy wrote:

 

Another interesting tidbit is that with no cables plugged into the RG except power, it stayed up/reachable for at least 16 hours, and probably would have stayed up for more. I was able to log into http://192.168.1.254 and check logs via wi-fi the entire time. It was only when I plugged DSL back in that it went unreachable in 1:36.

 


Maybe there is an issue at the NID causing it to freeze up. For some reason, I am still thinking electrical. I know that the NID is suppose to be fully grounded, and we even put protectors in there to prevent any electrical discharges, but I have seen during huge lightening storms where the electricity will blow out the protectors, and fry the RG.

 

That is probably a reach, but I've learned in the troubleshooting world to never rule anything out. If you are still noticing issues after trying everything else, let me know, and I will be glad to see what we can do about getting someone out there to check that.

 

-David T

If you encounter any issues with your service or equipment, I recommend checking out our Troubleshoot & Resolve solutions to help diagnose the issue.
Employee Contributor*
*I am an AT&T employee and the postings on this site are my own and don't necessarily represent AT&T's position, strategies or opinions.
Message 11 of 16 (873 Views)
Employee

Re: U-verse down every 1.5 - 2 hours

Hey David T.

 

I had a chance to look at the NAT table today, and during the hour and 34 minutes of RG reachability it never went below 986 of 1024 available translations. I’m also using 0 of 512 inbound translations and 0 pinhole rules. That data is not surprising, considering I only have one host connected to the RG, and recently factory reset the RG.

 

Also, I moved the power cable over to the other side of the room, plugged it back in, and as mentioned above the RG was reachable for one hour and 34 minutes.

 

I haven’t been out to the NID yet, but I’m curious to have a look around possibly this weekend.

 

Thanks!

Employee Contributor*
*I am an AT&T employee and the postings on this site are my own and don't necessarily represent AT&T's position, strategies or opinions.
Message 12 of 16 (841 Views)
Community Support

Re: U-verse down every 1.5 - 2 hours

Let me know how it goes, and if we need a tech to go out and replace anything, I can get that setup.

 

-David T

If you encounter any issues with your service or equipment, I recommend checking out our Troubleshoot & Resolve solutions to help diagnose the issue.
Employee Contributor*
*I am an AT&T employee and the postings on this site are my own and don't necessarily represent AT&T's position, strategies or opinions.
Message 13 of 16 (836 Views)
Employee

Re: U-verse down every 1.5 - 2 hours

Hey David T,

 

I checked the NID, and I see the POTS line and the DSL line punched and connected to the correct pairs for my inside wiring. I don’t use the POTS line anymore (from a job a couple of incarnations ago), so I can try that pair in an effort to rule out inside wiring. I also verified the ground from the NID is bound to the common ground beneath the meter. I don’t want to create havoc for the next prem tech, so I’ll wait until I hear from you before proceeding. 

Employee Contributor*
*I am an AT&T employee and the postings on this site are my own and don't necessarily represent AT&T's position, strategies or opinions.
Message 14 of 16 (797 Views)
Community Support

Re: U-verse down every 1.5 - 2 hours

Hi noguanoguy,

 

It sounds like everything is in order from there. The only other possible thing I can think of is the port, and we may want to try to change to a different one. I will be sending you a private message to discuss this option further in detail, because it is account specific.

 

-David T

If you encounter any issues with your service or equipment, I recommend checking out our Troubleshoot & Resolve solutions to help diagnose the issue.
Employee Contributor*
*I am an AT&T employee and the postings on this site are my own and don't necessarily represent AT&T's position, strategies or opinions.
Message 15 of 16 (790 Views)
Share this topic
Announcements

Welcome to the AT&T Community Forums!!! Stop by the Community How-To section for tips on how to get started.