Python - Variable iteration

So basically I have a Pandas dataframe named “data” in which:

The first column contains all values from start to end of a recording timestamp;
A variable number of columns contains values in the form of “-1”, “0” or “1”.
Except for the timestamp column (named RecordingTimestamp) all other columns share part of the name. Specifically, all columns are named AOI[***]Hit, where the asterisks stand for the variable part of the name between the square brackets.

What I’d like to do with this data is loop through all the columns, find for each of them the timestamp corresponding to the first “0” (ts 0) and the first “1” (ts 1) and get the result of the subtraction ts1-ts0.
Up to now I managed to create a very basic script that does this only for one column.
Here it is:

ts = data[‘RecordingTimestamp’]
aoi = data[‘AOI[info_text]Hit’]

t1 = np.where(aoi == 0)
t1 = t1[0][0]
t2 = np.where(aoi == 1)
t2 = t2[0][0]

for i in data[‘AOI[info_text]Hit’]:
if i == 0:
val1 = t1
elif i == 1:
val2 = t2
What I’d like it to do eventually is loop through all the columns and print for each of them the name (taken from what’s between the square brackets) and the result of the subtraction (maybe in the form of an ordered dictionary). PLEASE NOTE: if the columns does not contain any “1”, instead of the result of the subtraction ts1-ts0 a string should be printed.

I can’t figure out how to create this kind of loop and any help would be really appreciated.

I had to look up ‘dataframe’ because you didn’t mention you were using Pandas.

From this page, I think the part about iterating through columns is what you want. Something like:

#I'm guessing your dataframe is named 'data'

for i in columns:
    print i   # print the column name.
    #you could filter here to find columns named AOIxxxHit
    # do something with column values: data[i]
for j in data[i]:
# j contains a single row value from the column

Thank you for the quick answer!

By now it seems that I managed how to loop through columns and get a list with column names + a list with the timestamp when value “0” or “1” occurs for the first time. I included also “1” because often there’s no “0” in the columns.  Here’s the code, I hope it’s right:

data = pd.read_excel('/Users/alessandro/Desktop/TtFF_RV/SEBMAR_AOIHit.xlsx')

ts = data['RecordingTimestamp']
columns = list(data)
aois = columns[1:]

start = []
for i in aois:
    for j in data[i]:
        if j == 0 or j == 1:
            time = np.where(data[i] == j)
            time = time[0][0]


The last step would be dropping from the dataframe every column that does not contain at all any “1”. It looked very easy to do, but I really can’t figure it out. Do you have any suggestion about that?

Again any help would be really appreciated!

Thanks in advance

1 Like

Can’t tell you if it’s “correct”, if it works, go with it!

I’m sure there’s a “smarter” way, but my brute-force approach would be to scan each column; if “1” doesn’t exist in any row of the column, then drop it.

hasOne = False

<for each row in column>

    hasOne = hasOne or (data[i] == 1)

if (not hasOne)

    <drop column>