| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Database

Page history last edited by David Troy 15 years, 10 months ago

From Dave Troy - the current underlying database schema can be reviewed here:

http://github.com/davetroy/votereport/tree/master/db/schema.rb

 

--

We're still sorting this out.  I made the following map as way of getting an idea of what we're dealing with.  The most important task is to make sure we build a database that can handle #votereport tweets

 

I have started the process of designing the database below.  If you can take my abstract ideas, and make them concrete, go right ahead and edit the page directly.

 

 

How Tweets are Structured

 

We need a single form for storing all of the information that comes in.  I'm going to focus on the structure of tweets.  All other inputs will have to be translated to match up with this.

 

Each tweet would be broken out into the following fields:

1. Sender ID - where their communication came from, so we can get back to them

2. Sender location - whether it arrives as zipcode or GPS, it should be translated into a specific location

3. Time of report - when the communication came in

4. All conditions reported - e.g. #wait:30, #machine

5. Trust variable - rating of 0-5 (assuming we are able to "bless" certain volunteers, we can weight their reports more heavily)

6. #EP tags -  We might as well store them, and translate the state code into a location, so we have the option of using the data, but keep them separate from the other issue tags

 

Using these fields, we can derive general status and heat maps, as well as specific info about problems.

 

How to get heat maps:

- every hashtag that indicates a problem can be counted as +1 for that geographic area

- every hashtag that indicates "all clear" can be counted as -1 for that geographic area

- the database should automatically generate this reading in a new field, so that any app can pull a ready-made heat reading instead of calling all the various data every time

 

We can also derive wait times for specific locations by compiling all of the submitted data

- this should also be pre-compiled within the database 

 

Resolving location issues

 

- if we want to get precinct-specific data, we'll need to build that into the database.  Apparently this is state-by-state and hard to get?

- otherwise we'll just have to get zipcode info, and then use our noggins to try to tie it back to specific precincts

- probabl lots more to figure out here....

 

Sweeper Data Cleanup Interface

 

- If we want really good data, we'll need sweepers who can watch incoming tweets and 'encode' them.  The sweepers will be able to look at tweet, and then edit the content of each database field that will be associated with that tweet.

All tweets will need to be tagged with the following status info

- not reviewed

- sent question to tweeter, awaiting response

- reviewed

Once a tweet has been reviewed, the data in the database will be updated accordingly.  There will be a field containing the original tweet, but the apps will all use the updated data.

If we are distributing this task out to many volunteers, we'll want to have edits confirmed by multiple sweepers.

 

Alternative Input Streams

 

We have four potential input streams.

  1. #votereport tweets
  2. iPhone app
  3. Voicemail interface
  4. SMS short code

 

Tweets are discussed above.  Here's where to discuss the rest.  I've put them in order of priority.

 

iPhone app:

Everyone seems to agree that this is a good idea, and should be easy to get the right data of it.

 

SMS Short Code:

Assuming we can get the short code donated, this is a great way to expand our user pool.

 

Voicemail interface:

Many concerns with screening voicemails:

1. Requires infrastructure investment

2. Requires screeners

3. People submitting reports who would have sent structured data, might opt for voicemail because it's easier for them

 

The automated interface is lighter and should produce useful data, but still has concerns:

1. Will it be confusing to have a number separate from 866-OUR-VOTE?

2. Could we convince 866-OUR-VOTE to add this automated option to their number?

Comments (3)

JonPincus said

at 2:55 pm on Oct 19, 2008

A great start. I'd add in an arc from "Database" to "watchdog groups" -- the raw data feeds are interesting.

Conceptually, consumers of the data want to be able to treat it as coming from a single federated database -- while recognizing that each of the projects has its own database. There are a couple of architectural approaches to this and a lot depends on scale, both in terms of the number of messages coming in and the number of queries. You don't want to be in a situation where one database (or input channel) getting overwhelmed causes the whole system to bog down.

One important thing to get started on now is the data model. Different systems may represent location in different ways (zipcode, geocode, precinct, etc.); what does this look like in the combined database? Are we going to standardize on the Election Protection incident types (which I think Wired is using; ours are a subset). etc. We want to have a basic version of this in place by the 10/24 Jam Session so it gets beaten on while there's still time to evolve if necessary.

JonPincus said

at 2:56 pm on Oct 19, 2008

Also, I think it's worth adding "supertweeters" as a node.

Andrew Turner said

at 5:16 pm on Oct 20, 2008

Instead of hooking directly to the "database" why not build a simple HTTP API first that wraps the DB. Then could add XMPP or whatever for other interfaces. This lets any number of publishers or visualization tools hang off the system.

You don't have permission to comment on this page.