• úvod
  • témata
  • události
  • tržiště
  • diskuze
  • nástěnka
  • přihlásit
    registrace
    ztracené heslo?
    SNIPERCZEZabbix, nagios a další monitorovací nástroje
    ATAN
    ATAN --- ---
    ted to prislo po 13 minutach znovu. a vzdy PROBLEM stav trva 0s.
    ATAN
    ATAN --- ---
    DRON:
    ktery? ten unreachable je standardni z template. {sailman-1-01:agent.ping.nodata(5m)}=1
    ten muj na strane agenta, ktery spousti batak s pingem funguje jak ma. {sailman-1-01:system.run[c:\zabbix\ping.bat,wait].regexp(down)}#0
    ATAN
    ATAN --- ---
    SNIPERCZE:

    v logu je neco divneho, ale nic me nenapad, proc se to deje.

    tohle je z reportu:
    Time	Host	Description	Status	Severity	Duration	Ack	Actions
    27 Feb 2013 12:37:30	sailman-1-01	Zabbix agent on sailman-1-01 is unreachable for 5 minutes	OK	High	4m 13s	No	
    Ok
    27 Feb 2013 12:37:30	sailman-1-01	Zabbix agent on sailman-1-01 is unreachable for 5 minutes	PROBLEM	High	0s	No	
    Ok
    


    tohle je z logu:
      9303:20130227:123017.330 Zabbix agent item [net.if.in[WAN Miniport (IKEv2)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9297:20130227:123019.397 Zabbix agent item [net.if.in[WAN Miniport (PPTP)]] on host [sailman-1-01] failed: another network error, wait for 15 seconds
      9301:20130227:123019.405 Zabbix agent item [perf_counter[\234(_Total)\1404]] on host [sailman-1-01] failed: another network error, wait for 15 seconds
      9304:20130227:123034.989 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9298:20130227:123038.463 Zabbix agent item [vfs.fs.size[E:,free]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9304:20130227:123053.081 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9297:20130227:123116.937 Zabbix agent item [net.if.in[WAN Miniport (SSTP)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9304:20130227:123131.183 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9297:20130227:123136.103 Zabbix agent item [net.if.out[WAN Miniport (PPTP)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123151.066 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9297:20130227:123206.612 Zabbix agent item [net.if.in[WAN Miniport (Network Monitor)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123222.227 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9301:20130227:123316.421 Zabbix agent item [net.if.in[WAN Miniport (SSTP)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9304:20130227:123331.278 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9303:20130227:123422.996 Zabbix agent item [net.if.out[WAN Miniport (IPv6)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123437.333 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9298:20130227:123440.492 Zabbix agent item [vfs.fs.size[D:,pfree]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9304:20130227:123458.317 Zabbix agent item [vfs.fs.size[D:,pfree]] on host [sailman-1-01] failed: another network error, wait for 15 seconds
      9304:20130227:123513.377 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9301:20130227:123518.333 Zabbix agent item [net.if.in[WAN Miniport (L2TP)]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123533.453 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9303:20130227:123623.620 Zabbix agent item [proc.num[]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9304:20130227:123638.687 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9297:20130227:123713.350 Zabbix agent item [net.if.in[WAN Miniport (Network Monitor)-QoS Packet Scheduler-0000]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123728.556 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
      9301:20130227:123818.554 Zabbix agent item [perf_counter[\234(_Total)\1402]] on host [sailman-1-01] failed: first network error, wait for 15 seconds
      9305:20130227:123834.632 resuming Zabbix agent checks on host [sailman-1-01]: connection restored
    
    DRON
    DRON --- ---
    ATAN: ten trigger mas napsanej jak? na posledni hodnotu?
    SNIPERCZE
    SNIPERCZE --- ---
    kdyz se kouknes do logu na zabbix serveru, jsou tam nejaky hlasky zrovna pro tenhle komp nebo je to cisty?
    ATAN
    ATAN --- ---
    na klientovi take poustim ze server pres system.run s batak, ktery pinguje domenu zabbix serveru a to nepada vubec (pinguje se 5x a pokud ani jednou uspesne tak se posila stav pro aktivaci triggeru, otestovano, trigger funguje).
    ATAN
    ATAN --- ---
    vytizeny neni prakticky vubec, a jak koukam, tak data maji mezery i v jinych triggerech.

    napr. u processor load 1min avg by mel mit data za kazdou minutu:

    2013.Feb.26 14:37:21 0.0333
    2013.Feb.26 14:36:21 0.0667
    2013.Feb.26 14:35:21 0.0167
    2013.Feb.26 14:34:36 0
    2013.Feb.26 14:32:21 0.0167
    2013.Feb.26 14:31:21 0.0167
    2013.Feb.26 14:28:21 0
    2013.Feb.26 14:26:21 0.0167
    2013.Feb.26 14:25:21 0.0167
    2013.Feb.26 14:24:21 0.0167
    2013.Feb.26 14:22:21 0.0167
    2013.Feb.26 14:21:21 0.0167
    SNIPERCZE
    SNIPERCZE --- ---
    pretizeni toho stroje a agent nestihne odpovedet do timeoutu?
    ATAN
    ATAN --- ---
    Resim ted jednoho zabbix klienta. na jednom windows 2008 64b bezi agent a nekolikrat denne se u nej zaktivuje trigger agent unreachable s tim ze se hned deaktivuje (v logu se uvadi trvani udalosti 0s). ten pocitac je na internetu za natem a port agenta se forwarduje. podle me je to zcela nahodile, ale v prumeru se to stava zhruba jendou za hodinu. U agentu na ostatnich pc (win a linux) se to nestava. Nevit co by to mohlo zpusobovat?
    SAMGARR
    SAMGARR --- ---
    Nepouzvate nekdo http://realopinsight.com/en/index.php? Snazim se to zkompilovat pod archlinuxem a porad koncim s errorem...
    DRON
    DRON --- ---
    peklo... sem se prave prepsal, misto #1 napsal #0... no ale je aspon videt, kdo ty alarmy cte :-)))
    DRON
    DRON --- ---
    SAMGARR: dikes. pri googleni nodata() sem narazil na https://www.zabbix.com/forum/showpost.php?p=124666&postcount=4 coz je presne ten workaround zmizeleho statusu
    SAMGARR
    SAMGARR --- ---
    DRON: na podobny triggery pouzivam nodata()
    DRON
    DRON --- ---
    jeste jsem to moc nezkoumal, ale zda se mi to, nebo v zabbixu 2.0 zmizela podpora expression status.last()? vzdycky sem mel v triggerech neco jako {server:status.last(0)}=2 coz mi hlidalo, ze od serveru prisel alespon nejaky update libovolneho itemu, takze server zije. ted mi to docela chybi.
    SAMGARR
    SAMGARR --- ---
    SNIPERCZE: pripadne pouzij utilitu zabbix_get
    SNIPERCZE
    SNIPERCZE --- ---
    monitoruj jestli bezi proces zabbix-server resp. zabbix-agentd
    AQUARIUS
    AQUARIUS --- ---
    Nainstalovali nam do prace jako soucast jednoho reseni Zabbix. Protoze vsude jinde pouzivame Nagios, potreboval bych poradit nejakej postup, jak Nagiosem monitorovat beh Zabbixu (jak serveru, tak agentu). Existuje na to uz nejake reseni? Pluginy si kdyztak dopisu.
    SNIPERCZE
    SNIPERCZE --- ---
    ZKOUMAL: zkusil bych pres media uzivatele. nebo tou eskalaci jak pise SAMGARR
    SAMGARR
    SAMGARR --- ---
    ZKOUMAL: asi bych to resil pomoci eskalace ty akce, jakoze 1 - 5 steps posli sms, pak nedelej nic.
    ZKOUMAL
    ZKOUMAL --- ---
    V zabbixu mam nastaveny monitoring serveru a v action mam aby to posilalo kazdou hodinu od 7-22.

    Pokud ale server vypadne v 21h a nikdo to nezaklikne tak to posila SMS az do rána.
    Maintenance status not in "maintenance"
    Time period in "1-7,07:30-22:00"
    Trigger value = "PROBLEM"
    Trigger severity >= "Average"
    Host group = "xxxx"

    Jde mi o to ze to neni tak důležity stroj a není potřeba aby to neustále zabbix hlásil až do rána. Jde to nějak obejít?
    Kliknutím sem můžete změnit nastavení reklam