Cumulative summation wrong/overflow results

I am working on numba cuda, i am trying to calculate the cumulation of a matrix and whenever the value exceed 127 i got overflow / wrong values the whole algorithm done within shared memory.

However i tried to predefine shared memory as uibt8, 16,32 and float32

It is both the cpu and gpu sides.
I just only checked the cuda side only the calculations but forgot to change the cpu sideso the output appeared only values limited to its predefined.