A probabilistic approach to the classification of large data sets is presented. For data drawn from distributions that do not satisfy the naive Bayes assumption (when the presence of features is not independent of one another), conditions on the distributions are given that guarantee the almost deterministic behavior of errors in approximation by neural networks. It is shown that mean values of correlations with network computational units, together with the growth of sizes of their sets of input/output functions, can be used to assess the suitability of networks for classes of tasks characterized by probabilities modeling their relevance for a given type of applications.
Classification of Large Data Sets by Neural Networks: A Probabilistic Viewpoint
Marcello Sanguineti
2026-01-01
Abstract
A probabilistic approach to the classification of large data sets is presented. For data drawn from distributions that do not satisfy the naive Bayes assumption (when the presence of features is not independent of one another), conditions on the distributions are given that guarantee the almost deterministic behavior of errors in approximation by neural networks. It is shown that mean values of correlations with network computational units, together with the growth of sizes of their sets of input/output functions, can be used to assess the suitability of networks for classes of tasks characterized by probabilities modeling their relevance for a given type of applications.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



