Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overflows in total actual memory/swap #102

Open
jsoriano opened this issue Mar 26, 2018 · 16 comments
Open

Overflows in total actual memory/swap #102

jsoriano opened this issue Mar 26, 2018 · 16 comments
Labels

Comments

@jsoriano
Copy link
Member

Some issues have been reported regarding weird huge values for used memory or swap. The reported values are near max uint64, and are calculated as Used = Total - Free. It seems that somehow at some moments the reported Free actual memory or swap is greater than Total memory or swap.

Cases reported so far:

@jsoriano jsoriano added the bug label Mar 26, 2018
@michbsd
Copy link

michbsd commented Mar 28, 2018

I am seeing this form two different hosts running 5.6.8 - can I help in providing any information ?

@michbsd
Copy link

michbsd commented Mar 28, 2018

[2018-03-28T12:00:44,407][DEBUG][o.e.a.b.TransportShardBulkAction] [H1Xfe5y] [metricbeat-2018.03.28][0] failed to execute bulk item (index) BulkShardRequest [[metricbeat-2018.03.28][0]] containing [index {[metricbeat-2018.03.28][doc][AWJsDXZ1uaTDC6ArGbwQ], source[{"@timestamp":"2018-03-28T10:00:43.463Z","beat":{"hostname":"nsXXXX.ovh.net","name":"nsXXXX.ovh.net","version":"5.6.8"},"metricset":{"module":"system","name":"memory","rtt":56235},"system":{"memory":{"actual":{"free":137322749952,"used":{"bytes":18446744073689468928,"pct":134350952.106600}},"free":137173819392,"swap":{"free":2139652096,"total":2147479552,"used":{"bytes":7827456,"pct":0.003600}},"total":137302667264,"used":{"bytes":128847872,"pct":0.000900}}},"type":"metricsets"}]}]

@michbsd
Copy link

michbsd commented Mar 28, 2018

❯ cat /compat/linux/proc/meminfo

	     total:    used:	free:  shared: buffers:	 cached:
Mem:  137302667264 387547136 136915120128 14020608 0 145809408
Swap: 2147479552 7827456 2139652096
MemTotal: 134084636 kB
MemFree:  133706172 kB
MemShared:    13692 kB
Buffers:          0 kB
Cached:      142392 kB
SwapTotal:  2097148 kB
SwapFree:   2089504 kB

@michbsd
Copy link

michbsd commented Mar 28, 2018

gosigar version :

❯ git log -1 | less
commit 1784bf4
Author: Jaime Soriano Pastor [email protected]
Date: Fri Mar 16 09:53:34 2018 -0700

Readd unreleased section to changelog

@michbsd
Copy link

michbsd commented Apr 13, 2018

Any news on this ? I have completely removed the memory metric from my hosts - until this is resolved.

Can I help in anyway?

@jsoriano
Copy link
Member Author

Hi @michbsd,

Thaks for the info provided, I think that it is being problematic to use the same logic to collect memory stats from Linux and FreeBSD as we do now, so to solve it in FreeBSD we should probably implement its own logic. We have also seen related issues on Linux but maybe they have a different cause. I'd like to take a look to this soon.

@michbsd
Copy link

michbsd commented Apr 13, 2018

Let me know if I can be of any assistance

@michbsd
Copy link

michbsd commented Jun 6, 2018

just pinging again about this.. more or less all of my servers are subject to this, so I cannot reliably do mem stats (graphs) - anyway, I could just exclude the "system.memory.actual.used.bytes" as that seems to be the only one bugging out ?

@michbsd
Copy link

michbsd commented Jun 7, 2018

I found an example of wrong reporting:

cat /usr/compat/linux/proc/meminfo
MemTotal:  66967400 kB
MemFree:   66897268 kB
Buffers:          0 kB
Cached:     2420336 kB
SwapTotal:  2097148 kB
SwapFree:   2097148 kB

If we look at the code in sigar_linux_common.go:

        table := map[string]*uint64{
                "MemTotal": &self.Total,
                "MemFree":  &self.Free,
                "Buffers":  &buffers,
                "Cached":   &cached,
        }

        if err := parseMeminfo(table); err != nil {
                return err
        }

        self.Used = self.Total - self.Free
        kern := buffers + cached
        self.ActualFree = self.Free + kern
        self.ActualUsed = self.Used - kern

With the above numbers, that would be:

Used = (total - free) : 66967400 - 66897268 = 70132
kern = (buffers + cached) : 0 + 2420336 = 2420336

ActualUsed = (Used - kern): 70132-2420336 = -2350204

@michbsd
Copy link

michbsd commented Jun 7, 2018

Hello again,

I wrote a crude (but working) fix - I have tested on 2-3 different FreeBSD boxes and it seems to work...

diff --git a/vendor/github.com/elastic/gosigar/sigar_freebsd.go b/vendor/github.com/elastic/gosigar/sigar_freebsd.go
index 602b4a0aa..89a510f46 100644
--- a/vendor/github.com/elastic/gosigar/sigar_freebsd.go
+++ b/vendor/github.com/elastic/gosigar/sigar_freebsd.go
@@ -106,3 +106,49 @@ func parseCpuStat(self *Cpu, line string) error {
        self.Idle, _ = strtoull(fields[4])
        return nil
 }
+
+func (self *Mem) Get() error {
+       val := C.uint32_t(0)
+        sc := C.size_t(4)
+
+        name := C.CString("vm.stats.vm.v_page_count")
+        _, err := C.sysctlbyname(name, unsafe.Pointer(&val), &sc, nil, 0)
+        C.free(unsafe.Pointer(name))
+        if err != nil {
+                return err
+        }
+       pagecount := uint64(val)
+
+        name = C.CString("vm.stats.vm.v_page_size")
+        _, err = C.sysctlbyname(name, unsafe.Pointer(&val), &sc, nil, 0)
+        C.free(unsafe.Pointer(name))
+        if err != nil {
+                return err
+        }
+       pagesize := uint64(val)
+
+        name = C.CString("vm.stats.vm.v_free_count")
+        _, err = C.sysctlbyname(name, unsafe.Pointer(&val), &sc, nil, 0)
+        C.free(unsafe.Pointer(name))
+        if err != nil {
+                return err
+        }
+        self.Free = uint64(val)
+
+        name = C.CString("vm.stats.vm.v_inactive_count")
+        _, err = C.sysctlbyname(name, unsafe.Pointer(&val), &sc, nil, 0)
+        C.free(unsafe.Pointer(name))
+        if err != nil {
+                return err
+        }
+       kern := uint64(val)
+
+       self.Total = uint64(pagecount * pagesize)
+
+        self.Used = self.Total - (self.Free*pagesize)
+        self.ActualFree = (self.Free*pagesize) + (kern*pagesize)
+        self.ActualUsed = self.Used - (kern*pagesize)
+
+        return nil
+}
+
diff --git a/vendor/github.com/elastic/gosigar/sigar_linux_common.go b/vendor/github.com/elastic/gosigar/sigar_linux_common.go
index 7753a7e79..dc325a162 100644
--- a/vendor/github.com/elastic/gosigar/sigar_linux_common.go
+++ b/vendor/github.com/elastic/gosigar/sigar_linux_common.go
@@ -52,26 +52,26 @@ func (self *LoadAverage) Get() error {
        return nil
 }

-func (self *Mem) Get() error {
-       var buffers, cached uint64
-       table := map[string]*uint64{
-               "MemTotal": &self.Total,
-               "MemFree":  &self.Free,
-               "Buffers":  &buffers,
-               "Cached":   &cached,
-       }
-
-       if err := parseMeminfo(table); err != nil {
-               return err
-       }
-
-       self.Used = self.Total - self.Free
-       kern := buffers + cached
-       self.ActualFree = self.Free + kern
-       self.ActualUsed = self.Used - kern
-
-       return nil
-}
+//func (self *Mem) Get() error {
+//     var buffers, cached uint64
+//     table := map[string]*uint64{
+//             "MemTotal": &self.Total,
+//             "MemFree":  &self.Free,
+//             "Buffers":  &buffers,
+//             "Cached":   &cached,
+//     }
+//
+//     if err := parseMeminfo(table); err != nil {
+//             return err
+//     }
+//
+//     self.Used = self.Total - self.Free
+//     kern := buffers + cached
+//     self.ActualFree = self.Free + kern
+//     self.ActualUsed = self.Used - kern
+//
+//     return nil
+//}

 func (self *Swap) Get() error {
        table := map[string]*uint64{

@jsoriano
Copy link
Member Author

jsoriano commented Jun 7, 2018

@michbsd oh, thanks for taking the initiative on this 🙂 Would you mind to open a pull request with your code?

@michbsd
Copy link

michbsd commented Jun 7, 2018

sure.. just now sure how to handle the sigar_linux_common part... cause I still want to use that file for FreeBSD (e.g. CPU) but not memory... how can I disregard only a part of the common file?

@jsoriano
Copy link
Member Author

jsoriano commented Jun 7, 2018

Move the (self *Mem) Get() function from the common file to the sigar_linux.go file.

@michbsd
Copy link

michbsd commented Jun 7, 2018

OK

@michbsd
Copy link

michbsd commented Jun 7, 2018

done

@jsoriano
Copy link
Member Author

jsoriano commented Jun 7, 2018

Continuing discussion about fix for FreeBSD in #106

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants