Projects tigase _server tigase-http-api Issues #32
robots.txt (#32)
Closed
Eric Dziewa opened 8 years ago
Due Date
2016-09-05

I was looking into why projects.tigase.org is slow to respond and I see these connections. Would it be possible to add a robots.txt to disallow googlebot indexing the site?

root@t2 ~/ # lsof -i -P |fgrep ':8080'      
java       2528      tigase   60u  IPv6 33354620      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-175.googlebot.com:58239 (CLOSE_WAIT)
java       2528      tigase   73u  IPv6 43033954      0t0  TCP projects.tigase.org:8080->tasarimbienali.iksv.org:39632 (CLOSE_WAIT)
java       2528      tigase   75u  IPv6 54768771      0t0  TCP projects.tigase.org:8080->62.28.167.234:58594 (CLOSE_WAIT)
java       2528      tigase   77u  IPv6 32621068      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-170.googlebot.com:65477 (CLOSE_WAIT)
java       2528      tigase   78u  IPv6 87173572      0t0  TCP t2.tigase.org:8080->crawl-66-249-66-143.googlebot.com:35121 (CLOSE_WAIT)
java       2528      tigase   80u  IPv6 33348959      0t0  TCP t2.tigase.org:8080->crawl-66-249-69-170.googlebot.com:60137 (CLOSE_WAIT)
java       2528      tigase   82u  IPv6 32821313      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-170.googlebot.com:55207 (CLOSE_WAIT)
java       2528      tigase   83u  IPv6 33750774      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-170.googlebot.com:60523 (CLOSE_WAIT)
java       2528      tigase   85u  IPv6 32941982      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-180.googlebot.com:33540 (CLOSE_WAIT)
java       2528      tigase   88u  IPv6 31587041      0t0  TCP t2.tigase.org:8080->crawl-66-249-75-191.googlebot.com:55831 (CLOSE_WAIT)
java       2528      tigase   89u  IPv6 43033955      0t0  TCP t2.tigase.org:8080->tasarimbienali.iksv.org:51102 (CLOSE_WAIT)
java       2528      tigase   94u  IPv6 87252787      0t0  TCP t2.tigase.org:8080->crawl-66-249-66-135.googlebot.com:43364 (CLOSE_WAIT)
java       2528      tigase   97u  IPv6 31568454      0t0  TCP t2.tigase.org:8080->crawl-66-249-75-191.googlebot.com:42527 (CLOSE_WAIT)
java       2528      tigase  100u  IPv6 43033961      0t0  TCP mail.tigase.org:8080->tasarimbienali.iksv.org:49095 (CLOSE_WAIT)
java       2528      tigase  101u  IPv6 32430902      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-180.googlebot.com:41094 (CLOSE_WAIT)
java       2528      tigase  104u  IPv6 30644770      0t0  TCP projects.tigase.org:8080->static-103-231-78-228.ctrls.in:49100 (CLOSE_WAIT)
java       2528      tigase  105u  IPv6 31591997      0t0  TCP t2.tigase.org:8080->crawl-66-249-75-191.googlebot.com:54978 (CLOSE_WAIT)
java       2528      tigase  106u  IPv6 54768772      0t0  TCP mail.tigase.org:8080->62.28.167.234:53855 (CLOSE_WAIT)
java       2528      tigase  109u  IPv6 32437039      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-180.googlebot.com:40067 (CLOSE_WAIT)
java       2528      tigase  110u  IPv6 31595682      0t0  TCP t2.tigase.org:8080->crawl-66-249-75-199.googlebot.com:57322 (CLOSE_WAIT)
java       2528      tigase  111u  IPv6 32822693      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-170.googlebot.com:53108 (CLOSE_WAIT)
java       2528      tigase  112u  IPv6 33341510      0t0  TCP t2.tigase.org:8080->crawl-66-249-69-180.googlebot.com:61553 (CLOSE_WAIT)
java       2528      tigase  113u  IPv6 41618248      0t0  TCP projects.tigase.org:8080->61.161.130.75:52515 (ESTABLISHED)
java       2528      tigase  114u  IPv6 34486225      0t0  TCP t2.tigase.org:8080->crawl-66-249-75-191.googlebot.com:46862 (CLOSE_WAIT)
java       2528      tigase  115u  IPv6 32607210      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-175.googlebot.com:41080 (CLOSE_WAIT)
java       2528      tigase  116u  IPv6 33361497      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-175.googlebot.com:35268 (CLOSE_WAIT)
java       2528      tigase  117u  IPv6 31595991      0t0  TCP t2.tigase.org:8080->crawl-66-249-75-191.googlebot.com:50445 (CLOSE_WAIT)
java       2528      tigase  118u  IPv6 32823258      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-170.googlebot.com:60052 (CLOSE_WAIT)
java       2528      tigase  119u  IPv6 34489058      0t0  TCP t2.tigase.org:8080->crawl-66-249-75-207.googlebot.com:43483 (CLOSE_WAIT)
java       2528      tigase  122u  IPv6 31571197      0t0  TCP t2.tigase.org:8080->crawl-66-249-75-191.googlebot.com:49015 (CLOSE_WAIT)
java       2528      tigase  130u  IPv6 33937288      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-170.googlebot.com:36163 (CLOSE_WAIT)
java       2528      tigase  133u  IPv6 32448016      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-175.googlebot.com:56545 (CLOSE_WAIT)
java       2528      tigase  134u  IPv6 31575556      0t0  TCP t2.tigase.org:8080->crawl-66-249-75-191.googlebot.com:43996 (CLOSE_WAIT)
java       2528      tigase  136u  IPv6 54768775      0t0  TCP t2.tigase.org:8080->62.28.167.234:41389 (CLOSE_WAIT)
java       2528      tigase  138u  IPv6 31579442      0t0  TCP t2.tigase.org:8080->crawl-66-249-75-191.googlebot.com:61722 (CLOSE_WAIT)
java       2528      tigase  139u  IPv6 41618055      0t0  TCP projects.tigase.org:8080->61.161.130.75:60631 (ESTABLISHED)
java       2528      tigase  140u  IPv6 30644771      0t0  TCP mail.tigase.org:8080->static-103-231-78-228.ctrls.in:40569 (CLOSE_WAIT)
java       2528      tigase  142u  IPv6 32616651      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-180.googlebot.com:50008 (CLOSE_WAIT)
java       2528      tigase  143u  IPv6 25945458      0t0  TCP *:8080 (LISTEN)
java       2528      tigase  153u  IPv6 29733403      0t0  TCP projects.tigase.org:8080->li845-66.members.linode.com:51458 (CLOSE_WAIT)
java       2528      tigase  154u  IPv6 29733405      0t0  TCP t2.tigase.org:8080->li845-66.members.linode.com:56890 (CLOSE_WAIT)
java       2528      tigase  158u  IPv6 29733419      0t0  TCP mail.tigase.org:8080->li845-66.members.linode.com:36280 (CLOSE_WAIT)
java       2528      tigase  159u  IPv6 32950381      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-180.googlebot.com:46034 (CLOSE_WAIT)
java       2528      tigase  160u  IPv6 33361893      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-170.googlebot.com:58908 (CLOSE_WAIT)
java       2528      tigase  162u  IPv6 32623305      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-175.googlebot.com:45775 (CLOSE_WAIT)
java       2528      tigase  165u  IPv6 32450756      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-175.googlebot.com:62562 (CLOSE_WAIT)
java       2528      tigase  167u  IPv6 33933888      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-170.googlebot.com:38793 (CLOSE_WAIT)
java       2528      tigase  168u  IPv6 32958562      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-170.googlebot.com:51637 (CLOSE_WAIT)
java       2528      tigase  169u  IPv6 30644772      0t0  TCP t2.tigase.org:8080->static-103-231-78-228.ctrls.in:54697 (CLOSE_WAIT)
java       2528      tigase  172u  IPv6 59420953      0t0  TCP t2.tigase.org:8080->222.186.21.111:3300 (ESTABLISHED)
java       2528      tigase  173u  IPv6 33755059      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-170.googlebot.com:39764 (CLOSE_WAIT)
java       2528      tigase  176u  IPv6 32442189      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-175.googlebot.com:49177 (CLOSE_WAIT)
java       2528      tigase  309u  IPv6 32951001      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-175.googlebot.com:55565 (CLOSE_WAIT)
java       2528      tigase  312u  IPv6 87131585      0t0  TCP t2.tigase.org:8080->crawl-66-249-66-139.googlebot.com:38092 (CLOSE_WAIT)
java       2528      tigase  320u  IPv6 32698665      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-180.googlebot.com:41844 (CLOSE_WAIT)
java       2528      tigase  329u  IPv6 32626095      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-170.googlebot.com:45817 (CLOSE_WAIT)
java       2528      tigase  331u  IPv6 87262324      0t0  TCP t2.tigase.org:8080->crawl-66-249-66-135.googlebot.com:46368 (CLOSE_WAIT)
java       2528      tigase  335u  IPv6 87260162      0t0  TCP t2.tigase.org:8080->crawl-66-249-66-135.googlebot.com:39648 (CLOSE_WAIT)
java       2528      tigase  341u  IPv6 41618075      0t0  TCP projects.tigase.org:8080->61.161.130.75:50399 (ESTABLISHED)
java       2528      tigase  343u  IPv6 59421140      0t0  TCP t2.tigase.org:8080->222.186.21.111:1333 (ESTABLISHED)
java       2528      tigase  345u  IPv6 32957115      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-175.googlebot.com:49184 (CLOSE_WAIT)
java       2528      tigase  349u  IPv6 41618076      0t0  TCP projects.tigase.org:8080->61.161.130.75:51558 (ESTABLISHED)
java       2528      tigase  350u  IPv6 87162685      0t0  TCP t2.tigase.org:8080->crawl-66-249-66-143.googlebot.com:63014 (CLOSE_WAIT)
java       2528      tigase  352u  IPv6 32631823      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-170.googlebot.com:40874 (CLOSE_WAIT)
java       2528      tigase  355u  IPv6 41618332      0t0  TCP projects.tigase.org:8080->61.161.130.75:62237 (ESTABLISHED)
java       2528      tigase  356u  IPv6 87180628      0t0  TCP t2.tigase.org:8080->crawl-66-249-66-143.googlebot.com:56939 (CLOSE_WAIT)
java       2528      tigase  359u  IPv6 35469568      0t0  TCP t2.tigase.org:8080->111.74.239.231:4607 (ESTABLISHED)
java       2528      tigase  361u  IPv6 62211298      0t0  TCP t2.tigase.org:8080->112.64.235.252:41364 (CLOSE_WAIT)
java       2528      tigase  367u  IPv6 33754426      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-170.googlebot.com:52881 (CLOSE_WAIT)
java       2528      tigase  371u  IPv6 41618367      0t0  TCP projects.tigase.org:8080->61.161.130.75:53967 (ESTABLISHED)
java       2528      tigase  376u  IPv6 32698305      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-170.googlebot.com:54536 (CLOSE_WAIT)
java       2528      tigase  384u  IPv6 32962698      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-175.googlebot.com:56899 (CLOSE_WAIT)
java       2528      tigase  386u  IPv6 41622535      0t0  TCP projects.tigase.org:8080->61.161.130.75:49580 (ESTABLISHED)
java       2528      tigase  483u  IPv6 41618277      0t0  TCP projects.tigase.org:8080->61.161.130.75:57355 (ESTABLISHED)
java       2528      tigase  487u  IPv6 33179017      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-170.googlebot.com:42469 (CLOSE_WAIT)
java       2528      tigase  488u  IPv6 33179611      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-170.googlebot.com:44547 (CLOSE_WAIT)
java       2528      tigase  489u  IPv6 33187176      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-170.googlebot.com:48616 (CLOSE_WAIT)
java       2528      tigase  491u  IPv6 33192047      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-170.googlebot.com:47686 (CLOSE_WAIT)
java       2528      tigase  493u  IPv6 33755041      0t0  TCP t2.tigase.org:8080->crawl-66-249-75-207.googlebot.com:45562 (CLOSE_WAIT)
java       2528      tigase  495u  IPv6 41618291      0t0  TCP projects.tigase.org:8080->61.161.130.75:58587 (ESTABLISHED)
java       2528      tigase  496u  IPv6 41618301      0t0  TCP projects.tigase.org:8080->61.161.130.75:59841 (ESTABLISHED)
java       2528      tigase  500u  IPv6 41622537      0t0  TCP projects.tigase.org:8080->61.161.130.75:51821 (ESTABLISHED)
java       2528      tigase  508u  IPv6 41618406      0t0  TCP projects.tigase.org:8080->61.161.130.75:61182 (ESTABLISHED)
java       2528      tigase  521u  IPv6 33370211      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-170.googlebot.com:48469 (CLOSE_WAIT)
java       2528      tigase  531u  IPv6 33758735      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-175.googlebot.com:36549 (CLOSE_WAIT)
java       2528      tigase  541u  IPv6 62211389      0t0  TCP t2.tigase.org:8080->112.64.235.252:41898 (CLOSE_WAIT)
java       2528      tigase  550u  IPv6 33764612      0t0  TCP t2.tigase.org:8080->crawl-66-249-75-199.googlebot.com:63689 (CLOSE_WAIT)
java       2528      tigase  551u  IPv6 33757572      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-170.googlebot.com:42638 (CLOSE_WAIT)
java       2528      tigase  552u  IPv6 33764870      0t0  TCP t2.tigase.org:8080->crawl-66-249-64-175.googlebot.com:47435 (CLOSE_WAIT)
java       2528      tigase  565u  IPv6 62213859      0t0  TCP t2.tigase.org:8080->.:52234 (CLOSE_WAIT)
Eric Dziewa commented 8 years ago

Also would it be possible to only bind to tigase.org interface instead of all so the URL cannot be accessed using projects.tigase.org?

root@t2 ~/ # dig +short projects.tigase.org   
198.27.120.210
root@t2 ~/ # dig +short tigase.org         
198.27.120.208
Eric Dziewa commented 8 years ago

I see entries for mail.tigase.org:8080 too.

root@t2 ~/ # dig +short mail.tigase.org
198.27.120.209
Andrzej Wójcik (Tigase) commented 8 years ago

I looked into this issue and looks like more people had issue with googlebot and CLOSE_WAIT connections. Usually they tried to fix it by using robots.txt - for some it worked and some banned googlebot IP addresses as robots.txt file did not help.

Also I see that some other connections (except from googlebot) created CLOSE_WAIT connections. If they are bots then maybe we should ban every bot in robots.txt by default when we add support for this file?

Right now we do not have support for serving robots.txt file - this is something which would need to be added. Also we bind to all IP addresses associated with every interfaces and for now there is no way to change it.

On our installation at t2.tigase.org we are using embedded HTTP server which comes from Java JDK. We have support for alternative HTTP server (Jetty) which in my opinion is more mature project that HTTP server embedded in JDK and may provide use with better performance and better handling of issues - ie. maybe it will be able to deal with CLOSE_WAIT issue?.

I currently have my own installation running with Jetty as HTTP server and I do not have any connections to 8080 with CLOSE_WAIT state.

For now I think we could try to switch to use Jetty as HTTP server for our HTTP API at t2.tigase.org and check if it will help.

In meanwhile I can work on tasks:

  • add support to force HTTP server to use exact IP address

  • add support for serving robots.txt file on pre domain basis

%kobit What do you think about this change? I now you wanted to use HTTP server embedded in JDK, however I'm not sure if we should use it or even promote usage of this HTTP server implementation if CLOSE_WAIT issue will be gone if we use Jetty.

Do you agree on this change in configuration of t2.tigase.org?

Artur Hefczyc commented 8 years ago

I am in favor of using as few external dependencies as possible. So using HTTP server embedded in JDK is the preferred way. However, I am ok with deploying Jetty as a workaround for now and working on a solution for embedded JDK.

The CLOSE_WAIT connections is a problem not only for HTTP server (REST API) but also for Bosh connections and maybe websockets. I remember reading about handling CLOSE_WAIT connections through the TCP/IP connections settings. It is possible to change CLOSE_WAIT timeout, so such connections are closed quickly and do not consume resources. I do not remember details though. It was quite a few years ago when I was working with Nasza Klasa on integrating Tigase with their system. For sure, the CLOSE_WAIT timeout can be changed on the OS level. It might be also possible to change it on the JVM level.

  1. %eric please check if you can adjust CLOSE_WAIT timeout on the OS level

  2. %andrzej.wojcik please take a look if it possible to adjust the CLOSE_WAIT timeout within the JVM and also continue with the plan described above - install Jetty and work on a solution for embedded HTTP server

Eric Dziewa commented 8 years ago

I've lowered the timeout to 5 minutes in sysctl.conf.

# was 7200
net.ipv4.tcp_keepalive_time = 300

Also disallowed port 8080 connections to projects/mail IPs.

iptables -I INPUT -p tcp -d 198.27.120.210 -m tcp --dport 8080 -j DROP
iptables -I INPUT -p tcp -d 198.27.120.209 -m tcp --dport 8080 -j DROP
Andrzej Wójcik (Tigase) commented 8 years ago

I looked into t2 and I see that we still have old CLOSE_WAIT connctions active. From what I see, I think that this are still the same CLOSE_WAIT connections.

Due to that I think that change of setting on OS level is not a working solution.

I tried to replicate this issue and finally I was able to create CLOSE_WAIT connection on my own. CLOSE_WAIT connections are connections closed by client but this close is not handled by server. In our case it happens in with HTTP server embedded in JDK if some unhandled exception appears and asynchronous processing is used. I will work on this fix probably tomorrow.

I will also check if Jetty is vulnerable to the same issue.

I think that if this will fix our issue then providing support for selecting IP address to bind by HTTP server and support for robots.txt can be moved to separate task ie. to be done in 7.2.0

%kobit %eric Do you agree?

Artur Hefczyc commented 8 years ago

%eric I am pretty sure the keepalive_time is not for CLOSE_WAIT and has no effect on this. Unfortunately I do not remember which kernel settings had an effect on this port state. However, I am thinking that maybe we could iptables rules which would reject connections from googlebot to some of our IP addresses?

%andrzej.wojcik I agree with your plan. I spent very little time on reading about the problem recently, so you probably know more about the issue and possible ways to solve it. From what I have read, it looks like this could/should be solved by explicit socket close by an application. So, if Tigase could discover an idle connection, it can explicitly close it, resolving the issue. We have a good mechanism for XMPP connections to detect broken connection, but this mechanism does not monitor HTTP connections though. So I am thinking, maybe closing idle HTTP connections after some timeout by Tigase could be a solution? Another possible approach could be to have a configurable list of IPs within the HTTP server for which we refuse connections and explicitly close connections if it is open.

Eric Dziewa commented 8 years ago

I can only adjust connection timeout using the kernel. Cannot do anything with CLOSE_WAIT.

From http://stackoverflow.com/a/17856372

"CLOSE_WAIT means that the local end of the connection has received a FIN from the other end, but the OS is waiting for the program at the local end to actually close its connection.

The problem is your program running on the local machine is not closing the socket. It is not a TCP tuning issue. A connection can (and quite correctly) stay in CLOSE_WAIT forever while the program holds the connection open."

We cannot ban googlebot because that would also affect indexing of tigase.net and other sites running on t2 webserver.

The only way those connections are going away is restarting tigase.

Artur Hefczyc commented 8 years ago

%eric you are correct. We have to improve code on the Tigase if possible. I say if possible because we use Java built-in HTTP server for this stuff and I am not sure how deep we can make our own customizations to the TCP/IP connections handling code. I remember that with Java1.3 there was not much that could be done in cases like this.

As for the IP Tables rules, I thought that maybe we have Tigase running on a separate IP address from tigase.net and other websites, so then we could ban search bots from accessing HTTP server running on Tigase or maybe we could disallow googlebot access only to port 8080?

Another possible solution would be to put Tigase's HTTP service on a different port, like: 8888 or something like this. Then, googlebot would not try to access it.

Andrzej Wójcik (Tigase) commented 8 years ago

I changed our code to make sure close() method of class implementing HttpExchange interface will be called. This way it should properly close input and output streams and in result close connection.

While testing it locally I was able to replicate connection in CLOSE_WAIT and after this changes it was not possible, so fix should work.

However during testing, I have found other issue inside embedded HTTP server - race condition issue. In some rare cases it is possible that 2 threads will try to read from same input stream which can cause lock. Fortunately our requests timeout properly takes care of this issue by closing this stream. But there is one catch - in some cases request may result in HTTP response with error code in which case client will need to send this request once again. It is rare situation but can happen.

%kobit I did whatever I could to deal with HTTP server embedded in JDK. Including debugging of HTTP server implementation to deal with this race condition leading to lock, but I was not able to do so. It is very hard to find exact cause. Only solution is to make this HTTP server using only single thread - which is not good for performance (ie. admin page and Web UI will load very slowly when single thread solution is used).

Due to that it is hard for me to tell that it is usable in production - it can be used but only for simple tasks.

I also checked Jetty and when Jetty is used, I was not able to replicate CLOSE_WAIT connection on port 8080 locally. So in my opinion we should use Jetty in production deployments in future.

%eric As for blocking googlebot - we can block it using iptables by blocking connections incoming from googlebot ip addresses to our 8080 port. This will not block requests from googlebot to our websites as they are hosted on port 80.

Code with my changes is ready and will be part of tomorrows snapshot build. So it may be good idea to deploy this next snapshot build on our servers (t2 and t6).

If this will work then issue is fixed, if not then I will reconfigure t2 and t6 servers to use Jetty.

Artur Hefczyc commented 8 years ago

%andrzej.wojcik am I correct that during normal use, I mean the way we or our customers use HTTP interface to the Tigase server the CLOSE_WAIT problem is negligible if it exist at all? We are only aware of it because of googlebot and maybe other search engines? If so, I think all the fixes and improvements you made should be good enough. Also, for the deadlock problem. How likely it may happen during normal use? If that happens, what are the implications for the end-user?

I am OK with having the JDK embedded HTTP server as default for test/devel or light use installations and Jetty as default for production systems. We should have this covered in our documentation though, %daniel .

Andrzej Wójcik (Tigase) commented 8 years ago

%kobit

Yes, in typical use case when server receives data in expected format I haven't seen possibility for CLOSE_WAIT - so it is negligible.

Lock may happen from time to time - it is random thing and it will result in 2 threads from HTTP server processing pool being locked for 1 or 2 times set for request timeout (from 1 to 2 minutes depending on case). After that it will return HTTP error and end user will need to repeat request. Other requests will be processed normally and will not be blocked by this lock.

I will also add this information to documentation for HTTP project for version 7.2.0 on which I'm working on. %daniel please add this information to current version of documentation (part of admin documentation for Tigase XMPP Server).

Eric Dziewa commented 8 years ago

I've installed b4282 on t2 and t6. I will try to find an authoritive list of googlebot IPs this week.

Eric Dziewa commented 8 years ago

Had to revert to previous build b4222 because cannot upload statistics. Cronjob is failing.

eric@tpub:~$ curl -X POST --connect-timeout 60 --max-time 120 -sS --data-binary '<test/>' --header 'Content-Type: application/xml' http://192.95.36.80:8080/rest/stats/upload
curl: (28) Operation timed out after 120001 milliseconds with 0 bytes received

t2, and t6 however are not having a problem uploading their own statistics locally with b4282.

Andrzej Wójcik (Tigase) commented 8 years ago

I looked into this issue and issue #4485 and found that rare locks I observed on my local installation are far more common events on t2, t6 and other server machines.

I compared this to build without my latest changes and I it appears that locks started when my changes to deal with CLOSE_WAIT were applied, so I reviewed my latest changes and found few lines leading to concurrent execution of same HTTP request. I changed them and tested on local installation and our remote test cluster and now I was not able to see any locks.

I suppose I solved locks issue by fixing issue with concurrent execution.

Next (tomorrow) snapshot build should be working fine.

Eric Dziewa commented 8 years ago

Installed b4284. Statistics upload is working, everything seems fine.

Eric Dziewa commented 8 years ago

The IP addresses used by Googlebot change from time to time.

Seems yandex has found the URL now.

root@t2 ~/ # lsof -i -P |fgrep ':8080'
java      19105      tigase   69u  IPv6 116134696      0t0  TCP t2.tigase.org:8080->spider-5-255-250-27.yandex.com:38777 (CLOSE_WAIT)
java      19105      tigase   75u  IPv6 117533650      0t0  TCP t2.tigase.org:8080->spider-141-8-143-142.yandex.com:58498 (CLOSE_WAIT)
java      19105      tigase   83u  IPv6 116141639      0t0  TCP t2.tigase.org:8080->spider-141-8-143-142.yandex.com:40728 (CLOSE_WAIT)
java      19105      tigase   85u  IPv6 116141094      0t0  TCP t2.tigase.org:8080->spider-5-255-250-16.yandex.com:59913 (CLOSE_WAIT)
java      19105      tigase   88u  IPv6 116137184      0t0  TCP t2.tigase.org:8080->spider-141-8-143-216.yandex.com:49152 (CLOSE_WAIT)
java      19105      tigase   90u  IPv6 117404647      0t0  TCP t2.tigase.org:8080->spider-5-255-250-89.yandex.com:54209 (CLOSE_WAIT)
java      19105      tigase  106u  IPv6 117399401      0t0  TCP t2.tigase.org:8080->spider-5-255-250-93.yandex.com:55026 (CLOSE_WAIT)
java      19105      tigase  109u  IPv6 117532948      0t0  TCP t2.tigase.org:8080->spider-5-255-250-71.yandex.com:60025 (CLOSE_WAIT)
java      19105      tigase  111u  IPv6 115596105      0t0  TCP t2.tigase.org:8080->spider-5-255-250-88.yandex.com:39489 (CLOSE_WAIT)
java      19105      tigase  113u  IPv6 115595568      0t0  TCP t2.tigase.org:8080->spider-141-8-143-154.yandex.com:62315 (CLOSE_WAIT)
java      19105      tigase  115u  IPv6 115589101      0t0  TCP t2.tigase.org:8080->spider-5-255-250-93.yandex.com:49932 (CLOSE_WAIT)
java      19105      tigase  116u  IPv6 115595835      0t0  TCP t2.tigase.org:8080->spider-5-255-250-6.yandex.com:45020 (CLOSE_WAIT)
java      19105      tigase  117u  IPv6 116137483      0t0  TCP t2.tigase.org:8080->spider-141-8-143-239.yandex.com:41081 (CLOSE_WAIT)
java      19105      tigase  125u  IPv6 116137743      0t0  TCP t2.tigase.org:8080->spider-100-43-91-29.yandex.com:62804 (CLOSE_WAIT)
java      19105      tigase  126u  IPv6 116141392      0t0  TCP t2.tigase.org:8080->spider-5-255-250-3.yandex.com:36610 (CLOSE_WAIT)
java      19105      tigase  141u  IPv6 114905579      0t0  TCP *:8080 (LISTEN)
java      19105      tigase  149u  IPv6 116149463      0t0  TCP t2.tigase.org:8080->spider-5-255-250-71.yandex.com:42442 (CLOSE_WAIT)
java      19105      tigase  157u  IPv6 117533415      0t0  TCP t2.tigase.org:8080->spider-141-8-143-163.yandex.com:35321 (CLOSE_WAIT)
java      19105      tigase  358u  IPv6 117530355      0t0  TCP t2.tigase.org:8080->spider-5-255-250-6.yandex.com:46767 (CLOSE_WAIT)
java      19105      tigase  476u  IPv6 117402079      0t0  TCP t2.tigase.org:8080->spider-141-8-143-154.yandex.com:55653 (CLOSE_WAIT)
java      19105      tigase  477u  IPv6 117403836      0t0  TCP t2.tigase.org:8080->spider-141-8-143-160.yandex.com:48005 (CLOSE_WAIT)
root@t2 ~/ # 

Andrzej there's also Meta Tags we can use. Would that be easy to add? If so we should use nofollow as well.

Andrzej Wójcik (Tigase) commented 8 years ago

I'm not sure if we can do anything more at this point. I switched to t2.tigase.org to use Jetty.

As for Meta Tags and nofollow, it will be hard to use it (if possible at all) as we do not return HTML - we usually return XML in our format or JSON (HTML is returned only in few exceptions).

Eric, please verify if we still have this issue with Jetty. If so, then we will need to take care of it, but on sunday I'm starting my holiday and will be back after around a week.

Eric Dziewa commented 8 years ago

I've looked through the logs since August and have not seen any googlebot on port 8080. I think it is safe to assume jetty has solved the issue.

issue 1 of 1
Type
New Feature
Priority
Normal
Assignee
RedmineID
4464
Issue Votes (0)
Watchers (0)
Reference
tigase/_server/tigase-http-api#32
Please wait...
Page is in error, reload to recover