Hello! This is my first post here even though I ahve stalked GitHub before. I am also relatively inexperienced with coding and handling "big" data. My background is in Mechanical Engineering and my coding/data handling has mostly been done in Excel and MatLab so far, but I would like this to change as I feel I am gravitating more towards Data Engineering/Science. I am currently working on a project where I have predefined test-cases I need to run/test and then analyse the data. My aim is to be able to identify system failures (against pre-defined criteria). Eventually I would like to be able to present statistics of the failures vs test conditions and ideally get to the point where I can anticipate failure modes, even before these happen. In the meantime I also have quasi-infinite amounts of random data generated by systems identical to the one under test, but being used in a random manner "in the real world" (i.e. not staged scenarios). From this random data I would like to be able to identify scenarios which are similar to the staged ones as well as new ones which have not yet been identified as use-cases. I'm guessing this would qualify as machine learning(?) I would mostly want to use Python for this as: a. I am keen to become more proficient with the language (my experience so far has been limited to following tutorials). b. It is pretty much the only thing I have at my disposition. So, my main questions at this point are: What is possible? Where do I start? Are there any exisiting examples of something similar which I can look at? Thanks for reading this. I look forwards to your feedback! PS: I hope I am not just retierating a common question. I did have a look around the forum before posting but didn't quite find anything similar.
... View more