Network Issue

Highlighted
Contributor

Network Issue

Ok so after tearing my hair out with this connection issue for the past week or so, I am coming here for come possible help.  First a bit of background...

 

A few friends and I have set up a server and colocated it at a datacenter in Atlanta.  The server is for sharing files on a large project we are working on and, since we have the space (why not?) storing our personal files as a backup.  The idea was that this server, wtih it's gigabit uplink, shared between the 5 of us could provide very good speeds, and as much storage as we want, for a really quite affordable price (about $10 a month each). 

 

Me being the one who has messed around with linux and is more than comfortable in a server command line ennvironment, I volunteered to administer the server.  For context, two of us are on comcast connections, two on cox, and one on uverse.  Now, on to the good stuff...

 

About a week ago (the server has been set up for about a month) the person with uverse at home calls me up and tells me the update to the project files is taking him forever to download (these files can be large, 18 MP tiff images in photoshop, can be hundreds of megabytes).  He has a uverse 20Mb/s connection so at its best he sees about 2MB/s.  In this instance he was seeing about 150KB/s (~1.1Mb/s).  So I started investigating.

 

First check everyone elses connection, all good.  Everyone is pulling full speed from the server.  Ok, so maybe its his connection, ssh into his computer and download a file from softlayer's wdc01 test server...full 2MB/s speed.  Hmmm.

 

So I contact the datacenter after a traceroute shows the traffic going through cogent (hate cogent).  They reroute it and move it through nlayer the next morning.  This is where it got a little more interesting.  Speeds were fine all day the next day, fixed right?  Wrong.  About 7PM the speeds dropped back to 100KB/s, and remained there until midnight.  And this has continued, like clockwork (give or take an hour) since then, for a week.  So clearly we have a congestion issue here.  As you can see from this traceroute (from the server to uverse) during the affected hours, the problems appear to start when  traffic hits the att network:

 

traceroute to 99.104.128.49 (99.104.128.49), 30 hops max, 60 byte packets
1 (server ip) (---------) 0.294 ms 0.368 ms 0.443 ms
2 dct-ds42-ve-41.dacentec.com (199.241.190.229) 1.718 ms 0.227 ms 1.865 ms
3 199.191.57.193 (199.191.57.193) 0.292 ms 0.429 ms 0.533 ms
4 dct-cr03--v51.dacentec.com (199.255.156.54) 8.950 ms 9.062 ms 9.153 ms
5 v795.cr01.atl01.mzima.net (67.199.136.161) 8.888 ms 8.908 ms 8.873 ms
6 vlan-74.ar1.atl1.us.nlayer.net (69.31.135.166) 11.639 ms 11.449 ms 13.371 ms
7 as2914.ae9.ar1.atl1.us.nlayer.net (69.31.135.173) 9.493 ms ae-8.r04.atlnga05.us.bb.gin.ntt.net (204.2.241.93) 10.851 ms as2914.ae9.ar1.atl1.us.nlayer.net (69.31.135.173) 9.554 ms
8 192.205.36.157 (192.205.36.157) 46.195 ms ae-1.r04.atlnga05.us.bb.gin.ntt.net (129.250.5.189) 10.818 ms 192.205.36.157 (192.205.36.157) 50.339 ms
9 cr1.attga.ip.att.net (12.122.141.234) 56.852 ms 192.205.36.157 (192.205.36.157) 48.690 ms 48.683 ms
10 cr1.attga.ip.att.net (12.122.141.234) 52.851 ms cr2.wswdc.ip.att.net (12.122.1.174) 61.476 ms cr1.attga.ip.att.net (12.122.141.234) 52.759 ms
11 cr2.wswdc.ip.att.net (12.122.1.174) 52.369 ms 58.619 ms cr2.n54ny.ip.att.net (12.122.3.37) 56.966 ms
12 cr2.n54ny.ip.att.net (12.122.3.37) 57.135 ms 12.122.156.13 (12.122.156.13) 52.616 ms 56.612 ms
13 12.122.156.13 (12.122.156.13) 51.161 ms * *
14 * * *
... [15-29 removed for brevity]

30 * * *

 

And here is a traceroute BACK (also during affected time):

 

traceroute to server ip (-------------), 30 hops max, 60 byte packets
1 homeportal (----------) 2.132 ms 2.064 ms 16.012 ms
2 99-104-128-3.lightspeed.wlfrct.sbcglobal.net (99.104.128.3) 1118.211 ms 1118.182 ms 1118.134 ms
3 * * *
4 12.83.32.1 (12.83.32.1) 38.763 ms 38.737 ms 38.695 ms
5 ggr4.n54ny.ip.att.net (12.122.105.73) 44.348 ms 44.303 ms 44.258 ms
6 be3004.ccr21.jfk07.atlas.cogentco.com (154.54.12.93) 44.159 ms 24.123 ms 27.447 ms
7 be2058.mpd21.jfk02.atlas.cogentco.com (154.54.46.69) 27.267 ms be2057.ccr22.jfk02.atlas.cogentco.com (154.54.80.177) 26.893 ms be2058.mpd21.jfk02.atlas.cogentco.com (154.54.46.69) 26.853 ms
8 be2148.ccr21.dca01.atlas.cogentco.com (154.54.31.117) 34.591 ms be2149.ccr22.dca01.atlas.cogentco.com (154.54.31.125) 31.190 ms 34.367 ms
9 te2-7.ccr01.clt01.atlas.cogentco.com (154.54.30.146) 49.443 ms te4-4.ccr01.clt01.atlas.cogentco.com (154.54.84.254) 47.811 ms te7-7.ccr01.clt01.atlas.cogentco.com (154.54.85.6) 45.970 ms
10 dacentec.demarc.cogentco.com (38.104.168.10) 71.548 ms 72.044 ms 79.791 ms
11 dct-ds42-ve.dacentec.com (199.191.57.194) 79.755 ms 72.747 ms 70.558 ms12 dct-ds08-vl-41.dacentec.com (199.241.190.230) 72.356 ms 70.428 ms 70.696 ms
13 server ip (-----------) 73.600 ms 73.482 ms 74.481 ms

There is a ticket out with nlayer through the datacenter, however the network engineers at both the datacenter and nlayer seem to think the problem resides on the att network.

 

Any insight?

620 Views
Message 1 of 3
ACE - Professor

Re: Network Issue

First thing, at each remote location use an internet ping test to determine upload and download speeds; pcpitstop.com is one such place.

 

Because the U-verse connection is not a business link, which makes it less expensive, the upload speed is significantly less than the download speed. This fact can result in a poor experience like you described. Also, this connection does not have a fixed path from point A to point B. The path can change at any time, usually due to other data traffic but also due to outages.

 

In addition, many server apps are designed for LAN connections and may have inefficent file transfer methodologies that are masked by the high speed of a LAN but exposed by networking to a WAN (wide area network).

 

While a traceroute can provide valuable information for problem determination, you should also consider using a tool (I use WireShark on my Mac) to record the data flow. Evaluating the data flow could discover a possible tuning or configuration issue. One consideration is a windowing configuration issue. You may also consider altering MTU sizes, sometimes using a smaller MTU size may produce better results but you need to run a ping test using multiple packet sizes to determine the correct size.

 

Your issue is not an easy one to resolve.

_____________________________________________________
*The views and opinions expressed on this forum are purely my own. Any product claim, statistic, quote, or other representation about a product or service should be verified with the manufacturer, provider, or party.
Message 2 of 3
Contributor

Re: Network Issue

Well I had my data center try a few different transit providers in there mix and sending it out over another tier 1 fixed the problem.

While I appreciate the further steps to troubleshoot the issue, the fact that an IDENTICAL server hosted at another datacenter had no issues, and traffic coming to the uverse host had no issues, and the slowdowns were cyclical (occurring regularly between 7pm and Midnight), and now finally the fact that changing the transit provider has fixed the problem points to a clear congestion issue at the peering point between nlayer and att in this area.

Nlayer has been notified by the data center, I was really just trying to make att aware of the problem as well, hoping that some kind of uverse support person read these forums.
Message 3 of 3
You must type a description before you click preview or reply.
Share this topic
Announcements

Welcome to the AT&T TV Hub!
We have some helpful articles located under TV Hot Topics.
On the left you will see DIRECTV and U-verse TV links. Click on those links for more information regarding Apps, Billing, and TV troubleshooting tips.

Additional Support