Any interest in a SQLite version of the dataset?

« Prev
Topic
» Next
Topic
<12>
Jeff Moser's image
Jeff Moser
Kaggle Admin
Posts 356
Thanks 178
Joined 21 Aug '10 Email user
From Kaggle

We thought about releasing the second dataset as a single compressed SQLite database instead of a set of CSV files. I ultimately decided against this thinking that it might add too much complexity to importing the data into your favorite set of tools. However, after seeing some discussion from competitors, it seems that many people are just importing the CSV data into a database as their first step.

Therefore, I'd like to get your feedback for future data drops on these questions:

1.) Would a SQLite version of the dataset have been more convenient for you than the CSV files?

and a similar question:

2.) Would you have been able to read a SQLite version of the data just as easily as the CSV files? 

My main concern is that I didn't want to prevent anyone from reading the data.

Thanks in advance for your feedback!

 
Domcastro's image Posts 63
Thanks 13
Joined 8 Aug '10 Email user

Hi

I would prefer the CSVs, please

thanks

 
Jason Morris's image Posts 11
Thanks 3
Joined 2 Apr '11 Email user

I am using database tools for my work, an SQL file would be welcome.  Whether it is a CSV file or an SQL file, both can be digested by database programs.  Oracle, MySQL, and MSAccess all can take on CSV files as the norm, so there shouldn't be any hurdles with the data being in CSV format.

 
Solo Dolo's image Posts 8
Joined 17 Mar '11 Email user

I much prefer the cvs files.  

 
wxov0voxw's image Posts 6
Joined 20 Apr '11 Email user

csv file please

 
B Yang's image Rank 2nd
Posts 195
Thanks 46
Joined 12 Nov '10 Email user

I think a SQL Server backup, an Access .MDB file, or a MySQL database would be more useful. If you have to pick one format, then use the MS Access format because it can be easily imported by the other two DBs.

 
Jeff Moser's image
Jeff Moser
Kaggle Admin
Posts 356
Thanks 178
Joined 21 Aug '10 Email user
From Kaggle

It seems like CSVs files are preferred by most people as easy to import using something like the Data Dictionary to guide your own schema creation. I'd tend to avoid proprietary formats like MDBs since they're not quite as well supported as others like SQLite on many platforms.

Given the initial feedback, it looks like we should stick with CSVs, but I'm open to additional feedback based off your own import experience.

 
R. Kaan Ozbayrak's image Posts 13
Joined 20 Mar '11 Email user

Please continue with the CSV format.  I would not be able to read SQLite.  Thank you.

 
Zach's image Rank 31st
Posts 292
Thanks 64
Joined 2 Mar '11 Email user
csv is good for me too, but I can see how SQLite would be useful. Could you release both?
 
Chris Raimondi's image Rank 38th
Posts 194
Thanks 90
Joined 9 Jul '10 Email user
I want to make sure I express my love for CSV as well :)
Thanked by Jeff Moser
 
Jeff Moser's image
Jeff Moser
Kaggle Admin
Posts 356
Thanks 178
Joined 21 Aug '10 Email user
From Kaggle

Zach wrote:

csv is good for me too, but I can see how SQLite would be useful. Could you release both?

It seems that CSV is universally understood and creating your own database helps you understand the data better, so it doesn't seem like a SQLite edition would be much of a help, so I'll table the idea for now. 

As the competition goes on, I'll continue to watch to see if there are other improvements that can be done to make initially working with the date more fun :).

 
B Yang's image Rank 2nd
Posts 195
Thanks 46
Joined 12 Nov '10 Email user

CSV is for weenies, SQLite and other database formats are kind of OK, to really kick it up a notch, I advocate custom binary file formats.

 
Sali Mali's image Rank 4th
Posts 292
Thanks 113
Joined 22 Jun '10 Email user

May I suggest pipe delimited files with no quotes around text fields. This is the format that would be most friendly to all tools. Bulk inserts struggle when files contain quotes and having a pipe rather than a comma means you wont have to quote strings that contain a comma (as long as the string doesn't contain a pipe!),

I documents some issues I come across here...

http://anotherdataminingblog.blogspot.com/2011/05/progress-loading-hhp-data.html

 

 
Justin Washtell's image Posts 48
Thanks 15
Joined 26 Aug '10 Email user

Spoken audio on reel-to-reel tapes please. Apache dialect if possible.

Thanked by arbuckle , and ccomp
 
inf2207's image Posts 9
Joined 28 Apr '11 Email user
CSV is okay. But SQLite would be nice, too. :-)
 
<12>

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?