• úvod
  • témata
  • události
  • tržiště
  • diskuze
  • nástěnka
  • přihlásit
    registrace
    ztracené heslo?
    SNIPERCZEZabbix, nagios a další monitorovací nástroje
    ATAN
    ATAN --- ---
    diky, zkusim. protoze spojenim to nakonec nebylo. zatim jsem zvysil pocet ruznych polleru a trochu se to uklidnilo.
    SNIPERCZE
    SNIPERCZE --- ---
    bacha na nastaveni housekeeperu. Defaultne se spousti uklid databaze kazdou hodinu, coz muze mit nepriznivy dopad na chovani monitoringu - timeouty apod. a tim fake hlaseni o nedostupnosti stroju. V serverovem konfiguraku zmente interval na vyssi hodnotu (ja nastavil 12 hodin). Projevi se to hlavne na cpu loadu a iowait time na zabbix serveru
    ATAN
    ATAN --- ---
    DRON: dal jsem agenta i na druhy stejny pocitac za natem v siti jako ten prvni a dela to to same, takze problem bude nekde ve spojeni, pravdepodobne u natu. ale active checks jedou bez problemu, takze zbytek jen ignoruji.
    DRON
    DRON --- ---
    ATAN: aha, tak to asi nic. mas to za 5m (sorry, koukam, ze si taham z prastarych zabbixu prastare itemy a triggery). nepomohly by za tim natem aktivni checky? teda jestli by mohl byt problem v tom forwardingu portu.
    ATAN
    ATAN --- ---
    upravil jsem jinak nastaveni dle http://www.slideshare.net/xsbr/alexei-vladishev-zabbixperformancetuning a stejne nepomohlo.

    tak jeste jsem zjistil ze v queue jsou zpozdene veci (jen) z tohoto klienta.
    QUEUE OF ITEMS TO BE UPDATED	
     	
    Next check	Delayed by	Host	Name
    27 Feb 2013 12:35:31	1h 31m 41s	sailman-1-01	Version of zabbix_agent(d) running
    27 Feb 2013 12:35:32	1h 31m 40s	sailman-1-01	Host name of zabbix_agentd running
    27 Feb 2013 13:35:39	31m 33s	sailman-1-01	Total disk space on C:
    27 Feb 2013 13:35:40	31m 32s	sailman-1-01	Total disk space on D:
    27 Feb 2013 13:35:41	31m 31s	sailman-1-01	Total disk space on E:
    27 Feb 2013 14:05:17	1m 55s	sailman-1-01	File read bytes per second
    27 Feb 2013 14:05:20	1m 52s	sailman-1-01	Number of processes
    27 Feb 2013 14:05:21	1m 51s	sailman-1-01	Outgoing network traffic on Intel(R) PRO/1000 MT Network Connection-QoS Packet Scheduler-0000
    27 Feb 2013 14:05:21	1m 51s	sailman-1-01	Processor load (1 min average)
    27 Feb 2013 14:05:22	1m 50s	sailman-1-01	Outgoing network traffic on Intel(R) PRO/1000 MT Network Connection-WFP LightWeight Filter-0000
    27 Feb 2013 14:05:22	1m 50s	sailman-1-01	Processor load (15 min average)
    27 Feb 2013 14:05:23	1m 49s	sailman-1-01	Outgoing network traffic on WAN Miniport (IPv6)-QoS Packet Scheduler-0000
    27 Feb 2013 14:05:23	1m 49s	sailman-1-01	Processor load (5 min average)
    27 Feb 2013 14:05:24	1m 48s	sailman-1-01	Free swap space
    27 Feb 2013 14:05:24	1m 48s	sailman-1-01	Outgoing network traffic on Intel(R) PRO/1000 MT Network Connection
    27 Feb 2013 14:05:25	1m 47s	sailman-1-01	Outgoing network traffic on WAN Miniport (IP)
    27 Feb 2013 14:05:26	1m 46s	sailman-1-01	Outgoing network traffic on WAN Miniport (IP)-QoS Packet Scheduler-0000
    27 Feb 2013 14:05:27	1m 45s	sailman-1-01	Outgoing network traffic on WAN Miniport (Network Monitor)-QoS Packet Scheduler-0000
    27 Feb 2013 14:05:27	1m 45s	sailman-1-01	System uptime
    27 Feb 2013 14:05:28	1m 44s	sailman-1-01	Free memory
    27 Feb 2013 14:05:28	1m 44s	sailman-1-01	Outgoing network traffic on WAN Miniport (PPPOE)
    27 Feb 2013 14:05:29	1m 43s	sailman-1-01	Outgoing network traffic on RAS Async Adapter
    27 Feb 2013 14:05:30	1m 42s	sailman-1-01	Agent ping
    27 Feb 2013 14:05:30	1m 42s	sailman-1-01	Outgoing network traffic on WAN Miniport (SSTP)
    27 Feb 2013 14:05:31	1m 41s	sailman-1-01	Outgoing network traffic on WAN Miniport (IKEv2)
    27 Feb 2013 14:05:32	1m 40s	sailman-1-01	Outgoing network traffic on WAN Miniport (L2TP)
    27 Feb 2013 14:05:33	1m 39s	sailman-1-01	Free disk space on C:
    27 Feb 2013 14:05:33	1m 39s	sailman-1-01	Outgoing network traffic on WAN Miniport (PPTP)
    27 Feb 2013 14:05:34	1m 38s	sailman-1-01	Free disk space on D:
    27 Feb 2013 14:05:34	1m 38s	sailman-1-01	Outgoing network traffic on Microsoft ISATAP Adapter
    
    ATAN
    ATAN --- ---
    ted to prislo po 13 minutach znovu. a vzdy PROBLEM stav trva 0s.
    ATAN
    ATAN --- ---
    DRON:
    ktery? ten unreachable je standardni z template. {sailman-1-01:agent.ping.nodata(5m)}=1
    ten muj na strane agenta, ktery spousti batak s pingem funguje jak ma. {sailman-1-01:system.run[c:\zabbix\ping.bat,wait].regexp(down)}#0
    ATAN
    ATAN --- ---
    SNIPERCZE:

    v logu je neco divneho, ale nic me nenapad, proc se to deje.

    tohle je z reportu:
    Time	Host	Description	Status	Severity	Duration	Ack	Actions
    27 Feb 2013 12:37:30	sailman-1-01	Zabbix agent on sailman-1-01 is unreachable for 5 minutes	OK	High	4m 13s	No	
    Ok
    27 Feb 2013 12:37:30	sailman-1-01	Zabbix agent on sailman-1-01 is unreachable for 5 minutes	PROBLEM	High	0s	No	
    Ok
    


    tohle je z logu:
      9303:20130227:123017.330 Zabbix agent item [net.if.in[WAN Miniport (IKEv2)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9297:20130227:123019.397 Zabbix agent item [net.if.in[WAN Miniport (PPTP)]] on host [sailman-1-01] failed: another network error, wait for 15 seconds
      9301:20130227:123019.405 Zabbix agent item [perf_counter[\234(_Total)\1404]] on host [sailman-1-01] failed: another network error, wait for 15 seconds
      9304:20130227:123034.989 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9298:20130227:123038.463 Zabbix agent item [vfs.fs.size[E:,free]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9304:20130227:123053.081 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9297:20130227:123116.937 Zabbix agent item [net.if.in[WAN Miniport (SSTP)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9304:20130227:123131.183 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9297:20130227:123136.103 Zabbix agent item [net.if.out[WAN Miniport (PPTP)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123151.066 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9297:20130227:123206.612 Zabbix agent item [net.if.in[WAN Miniport (Network Monitor)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123222.227 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9301:20130227:123316.421 Zabbix agent item [net.if.in[WAN Miniport (SSTP)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9304:20130227:123331.278 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9303:20130227:123422.996 Zabbix agent item [net.if.out[WAN Miniport (IPv6)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123437.333 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9298:20130227:123440.492 Zabbix agent item [vfs.fs.size[D:,pfree]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9304:20130227:123458.317 Zabbix agent item [vfs.fs.size[D:,pfree]] on host [sailman-1-01] failed: another network error, wait for 15 seconds
      9304:20130227:123513.377 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9301:20130227:123518.333 Zabbix agent item [net.if.in[WAN Miniport (L2TP)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123533.453 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9303:20130227:123623.620 Zabbix agent item [proc.num[]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9304:20130227:123638.687 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9297:20130227:123713.350 Zabbix agent item [net.if.in[WAN Miniport (Network Monitor)-QoS Packet Scheduler-0000]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123728.556 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9301:20130227:123818.554 Zabbix agent item [perf_counter[\234(_Total)\1402]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123834.632 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
    
    DRON
    DRON --- ---
    ATAN: ten trigger mas napsanej jak? na posledni hodnotu?
    SNIPERCZE
    SNIPERCZE --- ---
    kdyz se kouknes do logu na zabbix serveru, jsou tam nejaky hlasky zrovna pro tenhle komp nebo je to cisty?
    ATAN
    ATAN --- ---
    na klientovi take poustim ze server pres system.run s batak, ktery pinguje domenu zabbix serveru a to nepada vubec (pinguje se 5x a pokud ani jednou uspesne tak se posila stav pro aktivaci triggeru, otestovano, trigger funguje).
    ATAN
    ATAN --- ---
    vytizeny neni prakticky vubec, a jak koukam, tak data maji mezery i v jinych triggerech.

    napr. u processor load 1min avg by mel mit data za kazdou minutu:

    2013.Feb.26 14:37:21 0.0333
    2013.Feb.26 14:36:21 0.0667
    2013.Feb.26 14:35:21 0.0167
    2013.Feb.26 14:34:36 0
    2013.Feb.26 14:32:21 0.0167
    2013.Feb.26 14:31:21 0.0167
    2013.Feb.26 14:28:21 0
    2013.Feb.26 14:26:21 0.0167
    2013.Feb.26 14:25:21 0.0167
    2013.Feb.26 14:24:21 0.0167
    2013.Feb.26 14:22:21 0.0167
    2013.Feb.26 14:21:21 0.0167
    SNIPERCZE
    SNIPERCZE --- ---
    pretizeni toho stroje a agent nestihne odpovedet do timeoutu?
    ATAN
    ATAN --- ---
    Resim ted jednoho zabbix klienta. na jednom windows 2008 64b bezi agent a nekolikrat denne se u nej zaktivuje trigger agent unreachable s tim ze se hned deaktivuje (v logu se uvadi trvani udalosti 0s). ten pocitac je na internetu za natem a port agenta se forwarduje. podle me je to zcela nahodile, ale v prumeru se to stava zhruba jendou za hodinu. U agentu na ostatnich pc (win a linux) se to nestava. Nevit co by to mohlo zpusobovat?
    SAMGARR
    SAMGARR --- ---
    Nepouzvate nekdo http://realopinsight.com/en/index.php? Snazim se to zkompilovat pod archlinuxem a porad koncim s errorem...
    DRON
    DRON --- ---
    peklo... sem se prave prepsal, misto #1 napsal #0... no ale je aspon videt, kdo ty alarmy cte :-)))
    DRON
    DRON --- ---
    SAMGARR: dikes. pri googleni nodata() sem narazil na https://www.zabbix.com/forum/showpost.php?p=124666&postcount=4 coz je presne ten workaround zmizeleho statusu
    SAMGARR
    SAMGARR --- ---
    DRON: na podobny triggery pouzivam nodata()
    DRON
    DRON --- ---
    jeste jsem to moc nezkoumal, ale zda se mi to, nebo v zabbixu 2.0 zmizela podpora expression status.last()? vzdycky sem mel v triggerech neco jako {server:status.last(0)}=2 coz mi hlidalo, ze od serveru prisel alespon nejaky update libovolneho itemu, takze server zije. ted mi to docela chybi.
    SAMGARR
    SAMGARR --- ---
    SNIPERCZE: pripadne pouzij utilitu zabbix_get
    Kliknutím sem můžete změnit nastavení reklam