Biased Training Data Set?

Hi all, one very simple and basic question – I checked the forum, but could not find any information about this: In the competition description there is said that 1502 out of 2224 people died which gives you a general chance to survice of 32.5%. Taking the training data set: 549 out of 891 people died resulting in a chance of 38.4% surviving. My first (test) approach was then submitting the test data set with a randomly generated surviving-variable (32.5%) leading to a score of 0.47847. – Going by chance I would have expected a score of about 0.52-0.56. This goes along with various statements about lower scores then expected in the forum. Performing a simple t test between the death/surviving ratio in total and in the training data set…


Link to Full Article: Biased Training Data Set?

Pin It on Pinterest

Share This

Join Our Newsletter

Sign up to our mailing list to receive the latest news and updates about homeAI.info and the Informed.AI Network of AI related websites which includes Events.AI, Neurons.AI, Awards.AI, and Vocation.AI

You have Successfully Subscribed!