Intermittent domain unavailability

Discussion in 'General' started by smokinjo, Mar 15, 2018.

  1. smokinjo

    smokinjo Member HowtoForge Supporter

    For awhile now, there are times where my websites and email are not available on the server. They are just not working.
    I have a service that checks my server and with in 30 seconds of it not working, I get an email saying so. Same thing when it comes back up.
    It also seems to happen when the server seems ot be using up most of its RAM. I have 16GB RAM. It worked on 8GB untul this starting happening, so I bumped it up to 16GB but no affect. The RAM keeps getting slowly used up over the course of two or a few days. Then, the domains sporadically are unavailable. I was under the impression that 8GB was plenty fo RAM. I only have a 4 domains and 7 websites, maybe 12 emails, and about 15 mysql data bases(all les than 20MB each). Not a busy or full computer.

    I have included a copy of a reading while using teh "top" command:


    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    3287 mysql 20 0 6251m 622m 7812 S 33.7 3.9 7:53.96 mysqld
    4249 vmail 20 0 38216 3360 2452 R 4.7 0.0 0:01.30 imap
    4360 web11 20 0 343m 64m 28m R 4.4 0.4 0:42.13 php-cgi
    4028 web13 20 0 393m 109m 29m R 4.1 0.7 1:00.28 php-cgi
    4297 web11 20 0 344m 65m 28m R 4.1 0.4 0:44.67 php-cgi
    4309 web13 20 0 399m 115m 29m R 4.1 0.7 0:59.60 php-cgi
    4343 web11 20 0 344m 65m 28m R 4.1 0.4 0:41.50 php-cgi
    4346 web5 20 0 327m 47m 28m R 4.1 0.3 0:08.06 php-cgi
    4402 web13 20 0 401m 118m 29m R 4.1 0.7 0:57.29 php-cgi
    4434 web11 20 0 344m 65m 28m R 4.1 0.4 0:41.94 php-cgi
    4673 web11 20 0 343m 64m 28m R 4.1 0.4 0:41.26 php-cgi
    4007 web11 20 0 344m 65m 28m R 3.8 0.4 0:42.84 php-cgi
    4083 web11 20 0 344m 65m 28m R 3.8 0.4 0:44.13 php-cgi
    4114 web11 20 0 346m 65m 28m R 3.8 0.4 0:44.21 php-cgi
    4169 web13 20 0 381m 96m 29m R 3.8 0.6 1:00.56 php-cgi
    4181 web13 20 0 390m 104m 29m R 3.8 0.6 1:03.44 php-cgi
    4299 web11 20 0 344m 65m 28m R 3.8 0.4 0:44.08 php-cgi
    4323 web13 20 0 383m 98m 29m R 3.8 0.6 0:59.12 php-cgi
    4398 web11 20 0 344m 65m 28m R 3.8 0.4 0:41.67 php-cgi
    4401 web11 20 0 344m 65m 28m R 3.8 0.4 0:42.82 php-cgi
    4584 web11 20 0 346m 65m 28m R 3.8 0.4 0:42.48 php-cgi
    4602 web11 20 0 344m 65m 28m R 3.8 0.4 0:41.39 php-cgi
    4604 web11 20 0 343m 64m 28m R 3.8 0.4 0:41.25 php-cgi
    4632 web11 20 0 344m 65m 28m R 3.8 0.4 0:41.41 php-cgi
    4718 web11 20 0 344m 65m 28m R 3.8 0.4 0:42.67 php-cgi
    4744 web11 20 0 344m 65m 28m R 3.8 0.4 0:43.12 php-cgi
    4763 web11 20 0 344m 65m 28m R 3.8 0.4 0:42.54 php-cgi
    4785 web11 20 0 344m 65m 28m R 3.8 0.4 0:42.28 php-cgi

    We can see that the php-cgi is al over the place. This disappears when the web sites are available.

    Might someone have an idea of why this is happening, and how to fis it?

    The only way to fix it is to reboot the server.

    Thanks

    Joseph
     
  2. till

    till Super Moderator Staff Member ISPConfig Developer

    Try to comment out the CustomLog line in the apache ispconfig.conf file and restart apache. Does this solve your problem, if yes, then there must be some kind of issue that makes vlogger to hang which then uses up all ressources..
     
  3. smokinjo

    smokinjo Member HowtoForge Supporter

    Thanks for the idea.
    Just to make sure that I modify the right file, I found three to choose form:
    /etc/apache2/sites-available/ispconfig.conf
    /etc/apache2/sites-enabled/000-ispconfig.conf
    /usr/local/ispconfig/server/conf/apache_ispconfig.conf.master

    I presume that it is the first one, as it is the exact name you gave me.

    Thanks for letting me know which one.

    Thanks
    Joseph
     
  4. smokinjo

    smokinjo Member HowtoForge Supporter

    Note: I opened all of them, and they seem to have the same contents
     
  5. till

    till Super Moderator Staff Member ISPConfig Developer

    I mean this one: /etc/apache2/sites-available/ispconfig.conf

    This /etc/apache2/sites-enabled/000-ispconfig.conf is just a symlink to the first file and this
    /usr/local/ispconfig/server/conf/apache_ispconfig.conf.master is used by ISPConfig to update the first file.

    As this is just a test to get closer to the source of the problem, there should be no need to add it permanently to
    /usr/local/ispconfig/server/conf/apache_ispconfig.conf.master now, especially as apache access.logging won't work anymore. so just remove or comment out the line for a test to see if that helps.
     
  6. smokinjo

    smokinjo Member HowtoForge Supporter

    I changed only the first one.
    After restarting apache, the web pages still do not work.
    With in 2 minutes, it went from a fresh reboot of 3.3 GB to 9.6GB used in RAM.
    mysql is the first PID when using top. after that, many, many php-cgi still are listed.
    Other possibilities?
    Thanks
    JOseph
     
  7. till

    till Super Moderator Staff Member ISPConfig Developer

    Ok, so undo this first and restart apache. The site which seems to cause this is web11 (look which user owns the processes that you see the most) and then take a look into the access.log of that site to see if there are any unusual requests causing this.
     
  8. smokinjo

    smokinjo Member HowtoForge Supporter

    I uncommented the customlog line
    I restarted apache
    I know which website is assigned to the user web11.
    I locaged the web11 folde ron the server.
    But where is the access.log file?
    Thanks
    Joseph
     
  9. smokinjo

    smokinjo Member HowtoForge Supporter

    OK, I found it.

    But, I will be honest and say that I do nt know what "unusual means.

    Should I copy some of the lines inthe file, or upload the file to the discussion board?

    Thanks

    Joseph
     
  10. smokinjo

    smokinjo Member HowtoForge Supporter

    OK, I did see something. every 1 to a few seconds, ther eis acal for the same webpage:
    49.83.161.137 - - [15/Mar/2018:05:26:12 -0400] "POST /kr/a-propos-de-nous-2/?share=email&nb=1 HTTP/1.1" 302 700 "http://www.eco2haiti.com/k$
    180.104.52.190 - - [15/Mar/2018:05:26:12 -0400] "GET /kr/a-propos-de-nous-2/?shared=email HTTP/1.1" 200 18278 "http://www.eco2haiti.com/kr/$
    114.234.191.235 - - [15/Mar/2018:05:26:13 -0400] "POST /kr/a-propos-de-nous-2/?share=email&nb=1 HTTP/1.1" 302 700 "http://www.eco2haiti.com$
    180.104.52.228 - - [15/Mar/2018:05:26:15 -0400] "POST /kr/a-propos-de-nous-2/?share=email&nb=1 HTTP/1.1" 302 700 "http://www.eco2haiti.com/$
    106.87.96.192 - - [15/Mar/2018:05:26:17 -0400] "POST /kr/a-propos-de-nous-2/?share=email&nb=1 HTTP/1.1" 302 700 "http://www.eco2haiti.com/k$
    114.234.191.235 - - [15/Mar/2018:05:26:18 -0400] "GET /kr/a-propos-de-nous-2/?shared=email HTTP/1.1" 200 18278 "http://www.eco2haiti.com/kr$
    49.83.161.137 - - [15/Mar/2018:05:26:17 -0400] "GET /kr/a-propos-de-nous-2/?shared=email HTTP/1.1" 200 18278 "http://www.eco2haiti.com/kr/a$
    180.104.52.228 - - [15/Mar/2018:05:26:18 -0400] "GET /kr/a-propos-de-nous-2/?shared=email HTTP/1.1" 200 18278 "http://www.eco2haiti.com/kr/$
    106.87.96.192 - - [15/Mar/2018:05:26:20 -0400] "GET /kr/a-propos-de-nous-2/?shared=email HTTP/1.1" 200 18278 "http://www.eco2haiti.com/kr/a$
    49.83.173.112 - - [15/Mar/2018:05:26:20 -0400] "POST /kr/a-propos-de-nous-2/?share=email&nb=1 HTTP/1.1" 302 700 "http://www.eco2haiti.com/k$
    114.236.80.156 - - [15/Mar/2018:05:26:21 -0400] "POST /kr/a-propos-de-nous-2/?share=email&nb=1 HTTP/1.1" 302 700 "http://www.eco2haiti.com/$
    49.81.51.21 - - [15/Mar/2018:05:26:24 -0400] "POST /kr/a-propos-de-nous-2/?share=email&nb=1 HTTP/1.1" 302 700 "http://www.eco2haiti.com/kr/$
    114.236.80.156 - - [15/Mar/2018:05:26:24 -0400] "GET /kr/a-propos-de-nous-2/?shared=email HTTP/1.1" 200 18278 "http://www.eco2haiti.com/kr/$
    49.83.164.182 - - [15/Mar/2018:05:26:24 -0400] "POST /kr/a-propos-de-nous-2/?share=email&nb=1 HTTP/1.1" 302 700 "http://www.eco2haiti.com/k$
    49.83.173.112 - - [15/Mar/2018:05:26:25 -0400] "GET /kr/a-propos-de-nous-2/?shared=email HTTP/1.1" 200 18278 "http://www.eco2haiti.com/kr/a$
    106.87.96.176 - - [15/Mar/2018:05:26:27 -0400] "POST /kr/a-propos-de-nous-2/?share=email&nb=1 HTTP/1.1" 302 700 "http://www.eco2haiti.com/k$
    106.87.96.176 - - [15/Mar/2018:05:26:27 -0400] "POST /kr/a-propos-de-nous-2/?share=email&nb=1 HTTP/1.1" 302 700 "http://www.eco2haiti.com/k$
    49.81.51.21 - - [15/Mar/2018:05:26:27 -0400] "GET /kr/a-propos-de-nous-2/?shared=email HTTP/1.1" 200 18278 "http://www.eco2haiti.com/kr/a-p$
    49.81.232.95 - - [15/Mar/2018:05:26:29 -0400] "POST /kr/a-propos-de-nous-2/?share=email&nb=1 HTTP/1.1" 302 700 "http://www.eco2haiti.com/kr$
    49.83.164.182 - - [15/Mar/2018:05:26:29 -0400] "GET /kr/a-propos-de-nous-2/?shared=email HTTP/1.1" 200 18278 "http://www.eco2haiti.com/kr/a$

    I repeats and repeats. This website get virtualy no traffic and we were going to shut it down eventuALLY.

    dOES THIS INFO HELP ANY?
    Joseph
     
  11. till

    till Super Moderator Staff Member ISPConfig Developer

    My guess is that the site might be hacked and e.g. sends out spam or does other nasty things or you get a dos. So many post requests are really unusual. To sum it up, your server setup might be fine, the system is just overloaded by a dos or a hacker calling a script with the intention to do something nasty or to overload your server or it's mabye some kind of mining software. Try to scan the site with e.g. https://ispprotect.com/, you can use ISPProtect for free on the first run.
     
  12. smokinjo

    smokinjo Member HowtoForge Supporter

    Hello again,

    I install ispprotect.
    It asks me for a path.
    What do I use? The path: /var/www/
    Joseph
     
  13. till

    till Super Moderator Staff Member ISPConfig Developer

    /var/www should be fine, if you want to scan all sites. If you want to scan just this one site, then use it's real path like:

    /var/www/clients/client1/web1

    instead. Replace the id's in the path t match that site.
     
  14. smokinjo

    smokinjo Member HowtoForge Supporter

    OK, I decided to check all sites, just in case.

    I will get back to you with the results.
    It says that the results will be in a bunch of different files.
    Thanks
    Joseph
     
  15. smokinjo

    smokinjo Member HowtoForge Supporter

    One quetsion.
    It seems that the web11 user /site is the active one.
    In theory, if I erase the site, the infection will be erased along with it.
    I could just remake the website form scratch, which would be fine.
    I have never erased a domain. Is it as simple removing it from ISP Config? Or, do I need to go on the command line and erase a few things?
    I noticed that with in 1 minute of restarting apche, all the php-cgi PIDs pop up again.

    Btw, how long will it take to scan my server when I used /var/www as the directory? I have 5 small websites, but I do have a cople fo CRMs in 3 of the subfolders. Maybe 12 emails and the same number of databases. I know that you can not give an exact time, but even a genral idea would be great.
    I just saw this:
    Scanning 246751 files now ...
    Scan level 1: 1% completed. 0 hits. [ETA 66:14:51]
    I think that I have my answer.

    Might it be taking longer because of the overload on the server?
    I ask, because normally, this unavailability goes in spurts, and it might calm down later. Maybe it will speed up...

    Thanks

    Joseph
     
  16. smokinjo

    smokinjo Member HowtoForge Supporter

    I noticed that apache was the one draining all the resources.
    So, I turned it off.
    Now the scan time is dropping quickly.! After turning off apache, it is not at 6 hours...
    It seems to be dropping yet more, so it should speed up.
    Thanks
    Jospeh
     
  17. till

    till Super Moderator Staff Member ISPConfig Developer

    Scan time depends heacily on server load and lthe load on the harddisk. instead of stopping apache completely, you shoild consider to just uncheck the 'active' checkbox of the website web11, so just this site get's disabled temporarily.

    If the cause is a dos, then you should consider putting cloudflare in front of website web11. If I remember correctly, there is a free plan at cloudflare.
     
  18. smokinjo

    smokinjo Member HowtoForge Supporter

    It was a long haul with the extended testing, and slowness due to the over activeservice, but it did get done about 8 hours later:)

    I went through the ISPP software and I got a few hits. Some seem benign, but maybe I am unware!

    I have a few installations of SuiteCRM, a fork form SugarCRM. For these installations, I get:
    Malware {ISPP}suspect.crypted.inflate in /var/www/clients/client2/web5/web/finance/modules/Users/authentication/SAML2Authenticate/lib/onelogin/php-saml/lib/Saml2/LogoutRequest.php

    AS well again in SuiteCRM, an upgrade patch seems to cause a flag to be raised:
    Malware {ISPP}suspect.eval.request in /var/www/clients/client2/web5/web/finance/upload/upgrades/patch/SuiteCRM-Upgrade-7.1.x-to-7.2.2-restore/modules/Configurator/views/view.addfontresult.php

    PHPList was also flagged:
    Malware {ISPP}suspect.globals.eval in /var/www/clients/client2/web13/web/phplist/admin/index.php
    Malware {ISPP}suspect.globals.eval in /var/www/clients/client2/web13/web/list/public_html/lists/admin/index.php

    Some wordpress sites need updating, so I wlill do that. I will go through and delete unneeded word pres plug ins. But, the apache need to be accessibel to change some of these thing, and it i snot.

    I erased a couple of other items that ispp suspected. Things I needed to erase anyway.

    The server is still unusable for now. Not able to access it via the web browser.

    Here is the latest top command report:
    top - 23:19:15 up 46 min, 3 users, load average: 139.50, 123.40, 105.20
    Tasks: 456 total, 93 running, 318 sleeping, 0 stopped, 45 zombie
    %Cpu(s): 50.0 us, 43.5 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 6.4 si, 0.0 st
    KiB Mem: 16474164 total, 3413700 used, 13060464 free, 85508 buffers
    KiB Swap: 0 total, 0 used, 0 free, 1245604 cached

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    3249 mysql 20 0 6109m 530m 7888 S 32.9 3.3 5:54.95 mysqld
    20 root rt 0 0 0 0 S 13.1 0.0 1:35.23 watchdog/3
    8218 root 20 0 35828 2180 1688 R 11.9 0.0 0:07.53 sendmail
    7896 root 20 0 304m 21m 12m R 11.3 0.1 0:15.56 php
    8235 root 20 0 33080 336 0 R 10.4 0.0 0:06.56 cron
    7282 vmail 20 0 37956 3156 2424 R 9.0 0.0 0:11.09 imap
    8232 root 20 0 37004 344 220 R 8.5 0.0 0:05.35 php
    8183 root 20 0 126m 5084 3468 R 7.7 0.0 0:07.07 php
    8205 root 20 0 68412 2276 1668 R 7.3 0.0 0:04.59 php
    8158 root 20 0 277m 10m 7256 R 7.2 0.1 0:06.51 php
    8224 root 20 0 76404 4268 2920 R 7.1 0.0 0:04.50 php
    7957 root 20 0 210m 8412 5836 R 6.7 0.1 0:11.46 php
    7953 root 20 0 315m 33m 12m R 6.5 0.2 0:13.17 php
    7907 root 20 0 245m 10m 7116 R 6.4 0.1 0:12.97 php
    8228 root 20 0 33080 340 0 R 5.8 0.0 0:03.64 cron
    8220 root 20 0 8 4 0 R 5.7 0.0 0:03.60 cron
    8163 root 20 0 189m 7656 5420 R 5.4 0.0 0:07.42 php

    The user web11 only showed up on one flag, which was one fo the suitecrm calls, mentioned above.

    Not sure if I am making progress or not:)

    Thanks

    Joseph
     
    Last edited: Mar 16, 2018
  19. smokinjo

    smokinjo Member HowtoForge Supporter

    I have rebooted and been monitoring my server and things seemed ot have calmed down.
    I have not yet turned back on the website for user web11.
    I noticed that user web13 seems to be active on the top command, so I took this website and the web11, and subcribed to cloudflare.
    The DNSs are switched, but will takea few hours to be switched.
    I will turn the other site back on in a bit.

    Please give me comments the gflagged files to see what might be causing the resource over use.

    Thanks
    Joseph
     
  20. smokinjo

    smokinjo Member HowtoForge Supporter

    Hello,
    I have removed the website for user web11 and user web13.
    Now, the computer crawls to a halt with in 20 minutes and needs rebooting.
    I did notice that when using the top command, mysql seems to take up lots of ram.
    VIRT - 5912m
    RES - 573m
    SHR - 7880
    Much more than any of the other processes I see using top.

    Might there be an issue here?

    Thanks
     

Share This Page