This is my first post here even though I ahve stalked GitHub before. I am also relatively inexperienced with coding and handling “big” data. My background is in Mechanical Engineering and my coding/data handling has mostly been done in Excel and MatLab so far, but I would like this to change as I feel I am gravitating more towards Data Engineering/Science.
I am currently working on a project where I have predefined test-cases I need to run/test and then analyse the data. My aim is to be able to identify system failures (against pre-defined criteria). Eventually I would like to be able to present statistics of the failures vs test conditions and ideally get to the point where I can anticipate failure modes, even before these happen.
In the meantime I also have quasi-infinite amounts of random data generated by systems identical to the one under test, but being used in a random manner “in the real world” (i.e. not staged scenarios). From this random data I would like to be able to identify scenarios which are similar to the staged ones as well as new ones which have not yet been identified as use-cases.
I’m guessing this would qualify as machine learning(?)
I would mostly want to use Python for this as:
a. I am keen to become more proficient with the language (my experience so far has been limited to following tutorials).
b. It is pretty much the only thing I have at my disposition.
So, my main questions at this point are:
What is possible?
Where do I start?
Are there any exisiting examples of something similar which I can look at?
Thanks for reading this. I look forwards to your feedback!
PS: I hope I am not just retierating a common question. I did have a look around the forum before posting but didn’t quite find anything similar.