• úvod
  • témata
  • události
  • tržiště
  • diskuze
  • nástěnka
  • přihlásit
    ztracené heslo?
    SNIPERCZEZabbix, nagios a další monitorovací nástroje
    Zabbix - "Zabbix offers advanced monitoring, alerting and visualisation features today which are missing in other monitoring systems, even some of the best commercial ones." Nagios - "Nagios is a powerful IT management system that enables organizations to identify and resolve IT infrastructure problems before they affect critical business processes."
    rozbalit záhlaví
    ATAN --- ---

    v logu je neco divneho, ale nic me nenapad, proc se to deje.

    tohle je z reportu:
    Time	Host	Description	Status	Severity	Duration	Ack	Actions
    27 Feb 2013 12:37:30	sailman-1-01	Zabbix agent on sailman-1-01 is unreachable for 5 minutes	OK	High	4m 13s	No	
    27 Feb 2013 12:37:30	sailman-1-01	Zabbix agent on sailman-1-01 is unreachable for 5 minutes	PROBLEM	High	0s	No	

    tohle je z logu:
      9303:20130227:123017.330 Zabbix agent item [net.if.in[WAN Miniport (IKEv2)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9297:20130227:123019.397 Zabbix agent item [net.if.in[WAN Miniport (PPTP)]] on host [sailman-1-01] failed: another network error, wait for 15 seconds
      9301:20130227:123019.405 Zabbix agent item [perf_counter[\234(_Total)\1404]] on host [sailman-1-01] failed: another network error, wait for 15 seconds
      9304:20130227:123034.989 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9298:20130227:123038.463 Zabbix agent item [vfs.fs.size[E:,free]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9304:20130227:123053.081 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9297:20130227:123116.937 Zabbix agent item [net.if.in[WAN Miniport (SSTP)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9304:20130227:123131.183 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9297:20130227:123136.103 Zabbix agent item [net.if.out[WAN Miniport (PPTP)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123151.066 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9297:20130227:123206.612 Zabbix agent item [net.if.in[WAN Miniport (Network Monitor)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123222.227 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9301:20130227:123316.421 Zabbix agent item [net.if.in[WAN Miniport (SSTP)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9304:20130227:123331.278 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9303:20130227:123422.996 Zabbix agent item [net.if.out[WAN Miniport (IPv6)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123437.333 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9298:20130227:123440.492 Zabbix agent item [vfs.fs.size[D:,pfree]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9304:20130227:123458.317 Zabbix agent item [vfs.fs.size[D:,pfree]] on host [sailman-1-01] failed: another network error, wait for 15 seconds
      9304:20130227:123513.377 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9301:20130227:123518.333 Zabbix agent item [net.if.in[WAN Miniport (L2TP)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123533.453 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9303:20130227:123623.620 Zabbix agent item [proc.num[]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9304:20130227:123638.687 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9297:20130227:123713.350 Zabbix agent item [net.if.in[WAN Miniport (Network Monitor)-QoS Packet Scheduler-0000]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123728.556 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9301:20130227:123818.554 Zabbix agent item [perf_counter[\234(_Total)\1402]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123834.632 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
    SNIPERCZE --- ---
    kdyz se kouknes do logu na zabbix serveru, jsou tam nejaky hlasky zrovna pro tenhle komp nebo je to cisty?
    ATAN --- ---
    na klientovi take poustim ze server pres system.run s batak, ktery pinguje domenu zabbix serveru a to nepada vubec (pinguje se 5x a pokud ani jednou uspesne tak se posila stav pro aktivaci triggeru, otestovano, trigger funguje).
    ATAN --- ---
    vytizeny neni prakticky vubec, a jak koukam, tak data maji mezery i v jinych triggerech.

    napr. u processor load 1min avg by mel mit data za kazdou minutu:

    2013.Feb.26 14:37:21 0.0333
    2013.Feb.26 14:36:21 0.0667
    2013.Feb.26 14:35:21 0.0167
    2013.Feb.26 14:34:36 0
    2013.Feb.26 14:32:21 0.0167
    2013.Feb.26 14:31:21 0.0167
    2013.Feb.26 14:28:21 0
    2013.Feb.26 14:26:21 0.0167
    2013.Feb.26 14:25:21 0.0167
    2013.Feb.26 14:24:21 0.0167
    2013.Feb.26 14:22:21 0.0167
    2013.Feb.26 14:21:21 0.0167
    SNIPERCZE --- ---
    pretizeni toho stroje a agent nestihne odpovedet do timeoutu?
    ATAN --- ---
    Resim ted jednoho zabbix klienta. na jednom windows 2008 64b bezi agent a nekolikrat denne se u nej zaktivuje trigger agent unreachable s tim ze se hned deaktivuje (v logu se uvadi trvani udalosti 0s). ten pocitac je na internetu za natem a port agenta se forwarduje. podle me je to zcela nahodile, ale v prumeru se to stava zhruba jendou za hodinu. U agentu na ostatnich pc (win a linux) se to nestava. Nevit co by to mohlo zpusobovat?
    SAMGARR --- ---
    Nepouzvate nekdo http://realopinsight.com/en/index.php? Snazim se to zkompilovat pod archlinuxem a porad koncim s errorem...
    DRON --- ---
    peklo... sem se prave prepsal, misto #1 napsal #0... no ale je aspon videt, kdo ty alarmy cte :-)))
    DRON --- ---
    SAMGARR: dikes. pri googleni nodata() sem narazil na https://www.zabbix.com/forum/showpost.php?p=124666&postcount=4 coz je presne ten workaround zmizeleho statusu
    SAMGARR --- ---
    DRON: na podobny triggery pouzivam nodata()
    DRON --- ---
    jeste jsem to moc nezkoumal, ale zda se mi to, nebo v zabbixu 2.0 zmizela podpora expression status.last()? vzdycky sem mel v triggerech neco jako {server:status.last(0)}=2 coz mi hlidalo, ze od serveru prisel alespon nejaky update libovolneho itemu, takze server zije. ted mi to docela chybi.
    SAMGARR --- ---
    SNIPERCZE: pripadne pouzij utilitu zabbix_get
    SNIPERCZE --- ---
    monitoruj jestli bezi proces zabbix-server resp. zabbix-agentd
    AQUARIUS --- ---
    Nainstalovali nam do prace jako soucast jednoho reseni Zabbix. Protoze vsude jinde pouzivame Nagios, potreboval bych poradit nejakej postup, jak Nagiosem monitorovat beh Zabbixu (jak serveru, tak agentu). Existuje na to uz nejake reseni? Pluginy si kdyztak dopisu.
    SNIPERCZE --- ---
    ZKOUMAL: zkusil bych pres media uzivatele. nebo tou eskalaci jak pise SAMGARR
    SAMGARR --- ---
    ZKOUMAL: asi bych to resil pomoci eskalace ty akce, jakoze 1 - 5 steps posli sms, pak nedelej nic.
    ZKOUMAL --- ---
    V zabbixu mam nastaveny monitoring serveru a v action mam aby to posilalo kazdou hodinu od 7-22.

    Pokud ale server vypadne v 21h a nikdo to nezaklikne tak to posila SMS az do rána.
    Maintenance status not in "maintenance"
    Time period in "1-7,07:30-22:00"
    Trigger value = "PROBLEM"
    Trigger severity >= "Average"
    Host group = "xxxx"

    Jde mi o to ze to neni tak důležity stroj a není potřeba aby to neustále zabbix hlásil až do rána. Jde to nějak obejít?
    INDIAN --- ---
    SAMGARR --- ---
    SNIPERCZE: super! ted jste tech dalsich 136 veci, ktery me na tom GUI neskutecne serou:D
    SNIPERCZE --- ---
    HEXXX: hodne like, treba mi to tu nebude shazovat browser
    HEXXX --- ---
    Homepage of Zabbix :: An Enterprise-Class Open Source Distributed Monitoring Solution

    What's New in 2.0.3
    :: Flicker free screens
    Screens in monitoring do not refresh the whole page, but all the individual elements are reloaded in the background and replaced.

    Kliknutím sem můžete změnit nastavení reklam