• úvod
  • témata
  • události
  • tržiště
  • diskuze
  • nástěnka
  • přihlásit
    registrace
    ztracené heslo?
    SNIPERCZEZabbix, nagios a další monitorovací nástroje
    ATAN
    ATAN --- ---
    DRON: dal jsem agenta i na druhy stejny pocitac za natem v siti jako ten prvni a dela to to same, takze problem bude nekde ve spojeni, pravdepodobne u natu. ale active checks jedou bez problemu, takze zbytek jen ignoruji.
    DRON
    DRON --- ---
    ATAN: aha, tak to asi nic. mas to za 5m (sorry, koukam, ze si taham z prastarych zabbixu prastare itemy a triggery). nepomohly by za tim natem aktivni checky? teda jestli by mohl byt problem v tom forwardingu portu.
    ATAN
    ATAN --- ---
    upravil jsem jinak nastaveni dle http://www.slideshare.net/xsbr/alexei-vladishev-zabbixperformancetuning a stejne nepomohlo.

    tak jeste jsem zjistil ze v queue jsou zpozdene veci (jen) z tohoto klienta.
    QUEUE OF ITEMS TO BE UPDATED	
     	
    Next check	Delayed by	Host	Name
    27 Feb 2013 12:35:31	1h 31m 41s	sailman-1-01	Version of zabbix_agent(d) running
    27 Feb 2013 12:35:32	1h 31m 40s	sailman-1-01	Host name of zabbix_agentd running
    27 Feb 2013 13:35:39	31m 33s	sailman-1-01	Total disk space on C:
    27 Feb 2013 13:35:40	31m 32s	sailman-1-01	Total disk space on D:
    27 Feb 2013 13:35:41	31m 31s	sailman-1-01	Total disk space on E:
    27 Feb 2013 14:05:17	1m 55s	sailman-1-01	File read bytes per second
    27 Feb 2013 14:05:20	1m 52s	sailman-1-01	Number of processes
    27 Feb 2013 14:05:21	1m 51s	sailman-1-01	Outgoing network traffic on Intel(R) PRO/1000 MT Network Connection-QoS Packet Scheduler-0000
    27 Feb 2013 14:05:21	1m 51s	sailman-1-01	Processor load (1 min average)
    27 Feb 2013 14:05:22	1m 50s	sailman-1-01	Outgoing network traffic on Intel(R) PRO/1000 MT Network Connection-WFP LightWeight Filter-0000
    27 Feb 2013 14:05:22	1m 50s	sailman-1-01	Processor load (15 min average)
    27 Feb 2013 14:05:23	1m 49s	sailman-1-01	Outgoing network traffic on WAN Miniport (IPv6)-QoS Packet Scheduler-0000
    27 Feb 2013 14:05:23	1m 49s	sailman-1-01	Processor load (5 min average)
    27 Feb 2013 14:05:24	1m 48s	sailman-1-01	Free swap space
    27 Feb 2013 14:05:24	1m 48s	sailman-1-01	Outgoing network traffic on Intel(R) PRO/1000 MT Network Connection
    27 Feb 2013 14:05:25	1m 47s	sailman-1-01	Outgoing network traffic on WAN Miniport (IP)
    27 Feb 2013 14:05:26	1m 46s	sailman-1-01	Outgoing network traffic on WAN Miniport (IP)-QoS Packet Scheduler-0000
    27 Feb 2013 14:05:27	1m 45s	sailman-1-01	Outgoing network traffic on WAN Miniport (Network Monitor)-QoS Packet Scheduler-0000
    27 Feb 2013 14:05:27	1m 45s	sailman-1-01	System uptime
    27 Feb 2013 14:05:28	1m 44s	sailman-1-01	Free memory
    27 Feb 2013 14:05:28	1m 44s	sailman-1-01	Outgoing network traffic on WAN Miniport (PPPOE)
    27 Feb 2013 14:05:29	1m 43s	sailman-1-01	Outgoing network traffic on RAS Async Adapter
    27 Feb 2013 14:05:30	1m 42s	sailman-1-01	Agent ping
    27 Feb 2013 14:05:30	1m 42s	sailman-1-01	Outgoing network traffic on WAN Miniport (SSTP)
    27 Feb 2013 14:05:31	1m 41s	sailman-1-01	Outgoing network traffic on WAN Miniport (IKEv2)
    27 Feb 2013 14:05:32	1m 40s	sailman-1-01	Outgoing network traffic on WAN Miniport (L2TP)
    27 Feb 2013 14:05:33	1m 39s	sailman-1-01	Free disk space on C:
    27 Feb 2013 14:05:33	1m 39s	sailman-1-01	Outgoing network traffic on WAN Miniport (PPTP)
    27 Feb 2013 14:05:34	1m 38s	sailman-1-01	Free disk space on D:
    27 Feb 2013 14:05:34	1m 38s	sailman-1-01	Outgoing network traffic on Microsoft ISATAP Adapter
    
    ATAN
    ATAN --- ---
    ted to prislo po 13 minutach znovu. a vzdy PROBLEM stav trva 0s.
    ATAN
    ATAN --- ---
    DRON:
    ktery? ten unreachable je standardni z template. {sailman-1-01:agent.ping.nodata(5m)}=1
    ten muj na strane agenta, ktery spousti batak s pingem funguje jak ma. {sailman-1-01:system.run[c:\zabbix\ping.bat,wait].regexp(down)}#0
    ATAN
    ATAN --- ---
    SNIPERCZE:

    v logu je neco divneho, ale nic me nenapad, proc se to deje.

    tohle je z reportu:
    Time	Host	Description	Status	Severity	Duration	Ack	Actions
    27 Feb 2013 12:37:30	sailman-1-01	Zabbix agent on sailman-1-01 is unreachable for 5 minutes	OK	High	4m 13s	No	
    Ok
    27 Feb 2013 12:37:30	sailman-1-01	Zabbix agent on sailman-1-01 is unreachable for 5 minutes	PROBLEM	High	0s	No	
    Ok
    


    tohle je z logu:
      9303:20130227:123017.330 Zabbix agent item [net.if.in[WAN Miniport (IKEv2)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9297:20130227:123019.397 Zabbix agent item [net.if.in[WAN Miniport (PPTP)]] on host [sailman-1-01] failed: another network error, wait for 15 seconds
      9301:20130227:123019.405 Zabbix agent item [perf_counter[\234(_Total)\1404]] on host [sailman-1-01] failed: another network error, wait for 15 seconds
      9304:20130227:123034.989 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9298:20130227:123038.463 Zabbix agent item [vfs.fs.size[E:,free]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9304:20130227:123053.081 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9297:20130227:123116.937 Zabbix agent item [net.if.in[WAN Miniport (SSTP)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9304:20130227:123131.183 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9297:20130227:123136.103 Zabbix agent item [net.if.out[WAN Miniport (PPTP)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123151.066 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9297:20130227:123206.612 Zabbix agent item [net.if.in[WAN Miniport (Network Monitor)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123222.227 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9301:20130227:123316.421 Zabbix agent item [net.if.in[WAN Miniport (SSTP)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9304:20130227:123331.278 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9303:20130227:123422.996 Zabbix agent item [net.if.out[WAN Miniport (IPv6)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123437.333 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9298:20130227:123440.492 Zabbix agent item [vfs.fs.size[D:,pfree]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9304:20130227:123458.317 Zabbix agent item [vfs.fs.size[D:,pfree]] on host [sailman-1-01] failed: another network error, wait for 15 seconds
      9304:20130227:123513.377 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9301:20130227:123518.333 Zabbix agent item [net.if.in[WAN Miniport (L2TP)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123533.453 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9303:20130227:123623.620 Zabbix agent item [proc.num[]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9304:20130227:123638.687 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9297:20130227:123713.350 Zabbix agent item [net.if.in[WAN Miniport (Network Monitor)-QoS Packet Scheduler-0000]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123728.556 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9301:20130227:123818.554 Zabbix agent item [perf_counter[\234(_Total)\1402]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123834.632 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
    
    DRON
    DRON --- ---
    ATAN: ten trigger mas napsanej jak? na posledni hodnotu?
    SNIPERCZE
    SNIPERCZE --- ---
    kdyz se kouknes do logu na zabbix serveru, jsou tam nejaky hlasky zrovna pro tenhle komp nebo je to cisty?
    ATAN
    ATAN --- ---
    na klientovi take poustim ze server pres system.run s batak, ktery pinguje domenu zabbix serveru a to nepada vubec (pinguje se 5x a pokud ani jednou uspesne tak se posila stav pro aktivaci triggeru, otestovano, trigger funguje).
    ATAN
    ATAN --- ---
    vytizeny neni prakticky vubec, a jak koukam, tak data maji mezery i v jinych triggerech.

    napr. u processor load 1min avg by mel mit data za kazdou minutu:

    2013.Feb.26 14:37:21 0.0333
    2013.Feb.26 14:36:21 0.0667
    2013.Feb.26 14:35:21 0.0167
    2013.Feb.26 14:34:36 0
    2013.Feb.26 14:32:21 0.0167
    2013.Feb.26 14:31:21 0.0167
    2013.Feb.26 14:28:21 0
    2013.Feb.26 14:26:21 0.0167
    2013.Feb.26 14:25:21 0.0167
    2013.Feb.26 14:24:21 0.0167
    2013.Feb.26 14:22:21 0.0167
    2013.Feb.26 14:21:21 0.0167
    SNIPERCZE
    SNIPERCZE --- ---
    pretizeni toho stroje a agent nestihne odpovedet do timeoutu?
    ATAN
    ATAN --- ---
    Resim ted jednoho zabbix klienta. na jednom windows 2008 64b bezi agent a nekolikrat denne se u nej zaktivuje trigger agent unreachable s tim ze se hned deaktivuje (v logu se uvadi trvani udalosti 0s). ten pocitac je na internetu za natem a port agenta se forwarduje. podle me je to zcela nahodile, ale v prumeru se to stava zhruba jendou za hodinu. U agentu na ostatnich pc (win a linux) se to nestava. Nevit co by to mohlo zpusobovat?
    SAMGARR
    SAMGARR --- ---
    Nepouzvate nekdo http://realopinsight.com/en/index.php? Snazim se to zkompilovat pod archlinuxem a porad koncim s errorem...
    DRON
    DRON --- ---
    peklo... sem se prave prepsal, misto #1 napsal #0... no ale je aspon videt, kdo ty alarmy cte :-)))
    DRON
    DRON --- ---
    SAMGARR: dikes. pri googleni nodata() sem narazil na https://www.zabbix.com/forum/showpost.php?p=124666&postcount=4 coz je presne ten workaround zmizeleho statusu
    SAMGARR
    SAMGARR --- ---
    DRON: na podobny triggery pouzivam nodata()
    DRON
    DRON --- ---
    jeste jsem to moc nezkoumal, ale zda se mi to, nebo v zabbixu 2.0 zmizela podpora expression status.last()? vzdycky sem mel v triggerech neco jako {server:status.last(0)}=2 coz mi hlidalo, ze od serveru prisel alespon nejaky update libovolneho itemu, takze server zije. ted mi to docela chybi.
    SAMGARR
    SAMGARR --- ---
    SNIPERCZE: pripadne pouzij utilitu zabbix_get
    SNIPERCZE
    SNIPERCZE --- ---
    monitoruj jestli bezi proces zabbix-server resp. zabbix-agentd
    AQUARIUS
    AQUARIUS --- ---
    Nainstalovali nam do prace jako soucast jednoho reseni Zabbix. Protoze vsude jinde pouzivame Nagios, potreboval bych poradit nejakej postup, jak Nagiosem monitorovat beh Zabbixu (jak serveru, tak agentu). Existuje na to uz nejake reseni? Pluginy si kdyztak dopisu.
    Kliknutím sem můžete změnit nastavení reklam