Go Back   Rhinocerus > Newsgroup > Newsgroup comp.databases.* > Newsgroup comp.databases.theory

Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old 04-10-2012, 10:04 PM
Tegiri Nenashi
Guest
 
Posts: n/a
Default Reigniting Probability theory debate

Several years ago there was a lively discussion here on c.d.t about
parallels between PT and RT. Most people defended the analogy, while
some refused to accept it. Here is my take:

http://cstheory.stackexchange.com/qu...pts-legitimate

Looking forward for your input.
Reply With Quote
Alt Today
Advertising
 
and become member of Rhinocerus
Standard Sponsored Links

  #2 (permalink)  
Old 04-11-2012, 09:19 PM
Tegiri Nenashi
Guest
 
Posts: n/a
Default Re: Reigniting Probability theory debate

On Apr 10, 2:04*pm, Tegiri Nenashi <tegirinena...@gmail.com> wrote:
> Several years ago there was a lively discussion here on c.d.t about
> parallels between PT and RT. Most people defended the analogy, while
> some refused to accept it. Here is my take:
>
> http://cstheory.stackexchange.com/qu...alogy-between-...
>
> Looking forward for your input.


OK, here is the exact reference:
http://groups.google.com/group/comp....bility&lnk=nl&
The debate started after Keith asserted that "Probability theory as a
generalization of logic is useful". Which logic? Propositional
calculus? Unlike Val I wouldn't challenge that assertion.
Propositional Logic is essentially Boolean Algebra which is rather
simplistic algebraic structure, known to be friendly to
generalizations (e.g. algebra of binary relations). Introductory
chapter of Jone's textbook focused on Propositional calculus and is
very convincing that it can be generalized to Calculus of
Probabilities. However, step up to Predicate Calculus, and it is not
at all evident that there is any connection to Probability Theory.

Reply With Quote
  #3 (permalink)  
Old 04-11-2012, 10:02 PM
compdb@hotmail.com
Guest
 
Posts: n/a
Default Re: Reigniting Probability theory debate

On Wednesday, 11 April 2012 14:19:59 UTC-7, Tegiri Nenashi wrote:

> Propositional Logic is essentially Boolean Algebra which is rather
> simplistic algebraic structure, known to be friendly to
> generalizations (e.g. algebra of binary relations). Introductory
> chapter of Jone's textbook focused on Propositional calculus and is
> very convincing that it can be generalized to Calculus of
> Probabilities. However, step up to Predicate Calculus, and it is not
> at all evident that there is any connection to Probability Theory.


Yes, and predicate logic is exactly Codd's (named attribute) relational algebra.

A (named attribute) relation can be seen as a set of or mapping on multi(-named-)dimensional points. Functional dependencies are properties of relation values and expressions.

Database relational operators are designed so that there is a correspondence between relation expressions and predicates (and predicate expressions aka wffs). The value of a relation expression is the extension of a corresponding predicate (and wff) where the relation value's attributes are the predicate's parameters (and the wff's free variables). A relation expression has an associated predicate (and wff) built from it in a certain way according to its operators and its variables' given predicates (and wffs). The fundamental theorem of the relational model is that IF the body of each relation variable's value is the set of tuples that make a given predicate (or wff) true THEN the body of each relation expression's value is the set of tuples that make that expression's predicate (or wff) true. Eg if the predicate of relation R is R(X,Y) "person X loves person Y" and the predicate of relation variable S is S(Y,Z) "person Y loves food Z" then the expression for(R JOIN S) PROJECT_AWAY Z is EXISTS Z [S(X,Y) AND R(Y,Z)] "there exists a Z such that person X loves person Y and person Y loves food Z" ie "person Xloves person Y who loves some food".

Probability operators for treating relations as probability distributions will do different things (in general) than database operators. They will satisfy different theorems.

A relation with a functional dependency can represent a function. Composition and images are relevant in databases when a relation expression corresponds to a function's (or wff term's) value. To the extent that distributionsare used as (relational or functional) mappings such representation-independent mapping-oriented operators will appear in that system.

So what we can expect is that what the two systems have in common is... they both somehow involve a relation as a set of or mapping on multi(-named-)dimensional points. Correspondences between operators other than ones that are oriented to mappings would be coincidental. I don't call that much of an analogy/parallel.

I doubt that relations are an appropriate abstraction for distributions per se. I expect that a relation is just one constituent of a proper distribution representation (which would include notions of dependent and independent coordinates/variables/attributes and calculating and renormalizing probabilities to sum to 1 and multiplying and summing for conditional and marginal probabilities and binary mappings in particular) and that any relevant operators on relations (which generally won't be database ones) are in turn used to define other operators on distribution representations per se. The paper's abstract says that conditional distributions and marginal distributions are given by selection and projection respectively. However I expect that what is actually the case is that for their relational representation of (some of) a distribution some notion of removing rows and columns happensas part of complex relation operators implementing distribution operations.. That is human vague reminiscence, not semantic correspondence.

philip
Reply With Quote
  #4 (permalink)  
Old 04-12-2012, 12:08 AM
Vadim Tropashko
Guest
 
Posts: n/a
Default Re: Reigniting Probability theory debate

On Apr 11, 2:02*pm, com...@hotmail.com wrote:
....
> I doubt *that relations are an appropriate abstraction for distributions per se. I expect that a relation is just one constituent of a proper distribution representation (which would include notions of dependent and independent coordinates/variables/attributes and calculating and renormalizing probabilities to sum to 1 and multiplying and summing for conditional and marginal probabilities and binary mappings in particular) and that any relevant operators on relations (which generally won't be database ones) are in turn used to define other operators on distribution representations per se. The paper's abstract says that conditional distributions and marginal distributions are given by selection and projection respectively. However I expect that what is actually the case is that for their relational representation of (some of) a distribution some notion of removing rows and columns happens as part of complex relation operators implementing distribution operations. That is human vague reminiscence, not semantic correspondence.
>


I agree and suggest that distribution correspond to not just a
relation, but the relation amended with partitioning of its attributes
into 2 sets. For example given a relation

Classes=[Prof Course Time]
Libkin DB101 Tue200
Libkin DB101 Thu500
Gromov Math Tue200
Gromov Math Thu500
Vianu DB101 Tue200
;

If we split attributes as {Prof }|{Course,Time} then we have
distribution

ProfessorDistribution=[Prof Probability]
Libkin 0.4
Gromov 0.4
Vianu 0.2
;

The same can be done for {Course}|{Prof,Time}

CourseDistribution=[Course Probability]
DB101 0.6
Math 0.4
;

The FD Prof->Course implies that the entropy of CourseDistribution is
lower than ProfessorDistribution:

http://vadimtropashko.wordpress.com/...itioned-table/
Reply With Quote
  #5 (permalink)  
Old 04-16-2012, 10:12 PM
compdb@hotmail.com
Guest
 
Posts: n/a
Default Re: Reigniting Probability theory debate

On Tuesday, 10 April 2012 15:04:59 UTC-7, Tegiri Nenashi wrote:
> Several years ago there was a lively discussion here on c.d.t about
> parallels between PT and RT. Most people defended the analogy, while
> some refused to accept it. Here is my take:
>
> http://cstheory.stackexchange.com/qu...pts-legitimate


Having now read the paper, the situation is as expected. (Except its notionof distribution has a very simple relation representation.)

This paper has nothing to do with the notion of tuples' associated 0-to-1 values that was discussed in comp.databases.theory.

1.
It is a great paper. Although it could be clearer about what it means by "correspond" and "parallel". (Hill also doesn't really understand the relational model: why its values and operators and primitive operator set are whatthey are.)

The paper defines functions representing relations and functions representing probability mass distributions. The functions have domains that are all possible tuples on some attributes. A relation's indicator function returns0-or-1 per whether a tuple is absent from or present in the relation. A distribution's probability function returns a 0-to-1 probability. By definition the sum of a probability function over all tuples is 1.

The paper gives certain definitions and theorems with parallel structure inexpressions and semantics. The definitions and theorems involve the function representations of relations and distributions. (The parallels are not about tuple-based representations per se.) It does not purport to prove themor to make any other claim than that there are a lot of parallels.

2.
The paper does not associate probabilities with tuples in the relation viewpoint. Its use of an indicator function has nothing to do with probabilities.

The paper would be clearer if the indicator functions were boolean instead of 0-or-1. Because Hill uses multiplication (implicitly expressed by adjacency) as a hack to express conjunction: 0-or-1 TIMES 0-or-1 for F-or-T AND F-or-T. Only the distribution semantics has multiplication.

3.
We can define for a distribution a corresponding "positive" relation whose indicator is zero iff the probability is: its tuples are those the distribution gives positive probabilities. A distribution operator maps its arguments' positive relations to its result's positive relation the way its corresponding relation operator maps those positive relations. (There is a homomorphism from distributions to relations.) The paper doesn't mention this. But it isn't needed by the paper!

Since every distribution has probabilities summing to 1, there is at least one tuple in every distribution, and no distribution has an empty positive relation. This is irrelevant to the paper!

Informally, the fact that the relation theorems are parallel to distribution theorems means that they are either about the values returned by functions or about functions; ie they are about non-empty relations. (Formally, they are vacuously true for empty relations.)

Interestingly the (only) non-emtpy 0-attribute relation, holding the (only)0-attribute tuple, corresponds to the (only) distribution on the (only) 0-dimensional space, namely the one giving 1 for its (only) point, the (only)0-dimensional point.

4.
The set of tuples mapped by a distribution to positive values can be represented by a relation. (Namely its positive relation.) So distribution operations that drop indicated attributes (marginals) or that drop tuples per conditions (conditionals) have definitions and theorems reminiscent of those of relation projection and restriction (respectively).

A distribution can be represented by a relation. (Namely the relation got from its positive relation by extending each tuple by an attribute giving its image in the probability function.) So some theorems about distributions as mappings are like theorems about relations as mappings.

A relation can be represented by a relation got from it by extending each tuple by an attribute with the value 1. So some theorems about relations having given attributes plus a 0-to-1 one will be similar for such relations and relations representing distributions. And some theorems will be the verysame for the 0-to-1 extra-attribute relation representations of relations and distributions. Put another way, some theorems about functions on given attributes returning 0-to-1 will apply to both relation indicator functionsand to distribution probability functions.

5.
Is it not obvious what a distribution union is, ie when an event can be generated from either of two independent distributions on the same variables, and that it would involve addition and division by 2, and its positive relation the union of argument positive relations? (Even though that still has nothing to do with corresponding primitive operator sets.)

6.
I don't know what you mean by "the analogy is weak". (FD vs transformation constraint.) The paper is clear about the "analogy". It gives precise definitions and theorems. Read the paper more closely. Don't jump to conclusionsabout the paper's claims and then criticize a misinterpretation of them.

In conclusion (1) (a) there is no correspondence set up with relations let alone empty relations; but there's nothing wrong with that and (b) it happens that there is a correspondence between distributions and non-empty relations; but there's nothing wrong with that and (c) the paper does not use "indicator function value as probability"; but absent relation tuples are nevertheless associated with zero; but there's nothing wrong with that and (2)(a) there is no correspondence set up with primitive operator sets; but there is no such correspondence; so there's nothing wrong with that and (b) distribution union has a simple correspondence with relation union. You don't seem to understand the senses of the correspondences and parallels.

philip
Reply With Quote
  #6 (permalink)  
Old 04-16-2012, 11:46 PM
Sampo Syreeni
Guest
 
Posts: n/a
Default Re: Reigniting Probability theory debate

On Apr 17, 1:12*am, com...@hotmail.com wrote:

I would seriously consider looking at the cloud of the articles
revolving around this one: http://dl.acm.org/citation.cfm?id=588011.588050
.. I mean, those "weak multivalued dependencies" and their ilk -- in
two precise articles which I cannot find, don't remember the author
of, and which I don't have access to anymore thanks to being laid off
-- basically prove that fourth normal form in database normalization
is equivalent to the natural factorization within bayesian
probabilistic networks.

The only difference is that you put in an extra attribute in your
database which denotes the basic marginal probability of your tuple
being true, and then follow the natural rules of probability calculus
when taking joins or cartesian products: the rest of the columns
become attributes/selection conditions upon the state of a closed
world, any union of such states becomes a sum on the probability
column, any intersection becomes a minus, and any join becomes a
multiplication+sum.

The first paper in the series shows that normalizing something to 4NF,
with the probability column present, is consistent with the resulting
database representing a base form Bayesian network. The second one
responds to criticism where it was claimed that the RM-BN-analogy
leads to a contradiction. The way it shows that is essentially the
same which was in its time used to show that not all universal
relations have a decomposition as a natural join of their projections
into their constituent, smaller relations. Only in this case the
argument was played in something of a reverse: since Bayesian networks
can never lead to the kinds of problems described, by their structure,
wrt the relational model, then we can derive an easy contradiction
from the supposed claim of inequivalence.

Seriously, look it up, even if I can't find a proper reference to the
first article. It has something to do with 4NF, Bayesianism, weak
multivalued normal forms, it's only about two little-known articles
with a single author in both, it's probably something that was
published by the ACM, and so on. But I at least thought it was a
beautiful result, if superficially trivial, when I saw it. At the
deeper level it tells us something about why that "independent
component" and "normalization" business happened in the first place,
in the history of the development of the relational model.
Reply With Quote
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off




All times are GMT. The time now is 10:50 AM.


Copyright ©2009

LinkBacks Enabled by vBSEO 3.3.0 RC2 © 2009, Crawlability, Inc.