Freebase Hack Day liveblogging: data modeling 101

2008 November 9
by Skud

Data modeling 101 with Jeff Prucher.

Types: everything in Freebase has a type. Types are things like person, city, etc.

A type has properties. Properties tell you something about a type. Eg. a person has a birthday, children, jobs.

Properties have expected types. Eg. the expected type of a person’s employer would be a business. When you fill in that property, if the thing you fill in isn’t already typed as a business, it will be.

Some other properties are primitive types, eg. dates, integers, strings. These are just stored as simple data.

Types can be standard (eg. “This thing is of type $type”), enumerated (a limited list that only the admin can add to), or compound value type/CVT which can be used to store multiple bits of data about a relationship, eg. film performances which store actor, role, and the film.

Simple relationship: “Films have actors”. But if you want to store extra stuff like what role they played, you create a CVT.

CVT topics still have guids and are addressable, but aren’t usually treated as “things” — they don’t have a name, or pictures associated with them, etc.

Question: How do you deal with people who say things like, “Why don’t we just use Dublin Core?”

Answer: We draw on existing schema wherever possible, but the points where they intersect (eg. books intersecting with people around “author”) is where we have to figure things out. Our goal is to be at least broadly compatible with existing schemas.

Chris: start with the simple stuff, start with the things people are really interested in. Don’t try to model every atom in the universe, at least not at first.

Jeff: start with the data you have. Don’t start with a theoretical schema you don’t have data for.

Ray K: could you model software rules (eg. “All freebase topics have an id”) in Freebase?

Jeff: In theory you could, but so far there haven’t been people doing taht.

Jack A: Do you have a model for historical events?

Kirrily: Yes, the event type in the Time commons.

Jeff mentioned the phylogeny pattern: a parent/child relationship, or a chain. Example: person has parents/children, where the expected types of both of these are “person”. Also seen in events (includes events, included in events), organizations (parent/child orgs), organism classifications (genus, species, etc).

Other common patterns:

Time series: to model time series data, use a CVT. Example: a person’s employment is modeled as a CVT that has the person, the employer, the job title, and the start and end dates. Other examples: population of an area over time, military service history, etc.

Question: how do you deal with schemas changing, keys for types changing, etc?

Jeff/Chris: we try not to change keys in the commons, but you can do whatever you want in your base. If a key changes, applications can break. However, you can change the display name as much as you want. You can add multiple keys (though not through the UI).

Danny H: Are there any standard ways for representing uncertainty?

Jeff: no standard ways have emerged, though we have one example in the commons so far: military casualties, which have a property for who claimed that number. We may need to come back to this problem a few times before we have a general solution.

Chris: you should be on our mailing lists. Data-modeling to discuss how to create schemas, developers list for warnings about changing schema that might affect applications (this happens increasingly rarely).

Question: are there conventions for naming keys?

Chris: the schema editor suggests lowercase with underscores.

Jack: suggestion – the Freebase app type could have a property for which parts of the schema they’re using, or somehow allow people to register for notifications about schema even if they don’t want to register their Freebase app (perhaps because they are behind a firewall).

Comments are closed.