Any interest in a SQLite version of the dataset?

« Prev
» Next

We thought about releasing the second dataset as a single compressed SQLite database instead of a set of CSV files. I ultimately decided against this thinking that it might add too much complexity to importing the data into your favorite set of tools. However, after seeing some discussion from competitors, it seems that many people are just importing the CSV data into a database as their first step.

Therefore, I'd like to get your feedback for future data drops on these questions:

1.) Would a SQLite version of the dataset have been more convenient for you than the CSV files?

and a similar question:

2.) Would you have been able to read a SQLite version of the data just as easily as the CSV files? 

My main concern is that I didn't want to prevent anyone from reading the data.

Thanks in advance for your feedback!


I would prefer the CSVs, please


I am using database tools for my work, an SQL file would be welcome.  Whether it is a CSV file or an SQL file, both can be digested by database programs.  Oracle, MySQL, and MSAccess all can take on CSV files as the norm, so there shouldn't be any hurdles with the data being in CSV format.

I much prefer the cvs files.  

csv file please

I think a SQL Server backup, an Access .MDB file, or a MySQL database would be more useful. If you have to pick one format, then use the MS Access format because it can be easily imported by the other two DBs.

It seems like CSVs files are preferred by most people as easy to import using something like the Data Dictionary to guide your own schema creation. I'd tend to avoid proprietary formats like MDBs since they're not quite as well supported as others like SQLite on many platforms.

Given the initial feedback, it looks like we should stick with CSVs, but I'm open to additional feedback based off your own import experience.

Please continue with the CSV format.  I would not be able to read SQLite.  Thank you.

csv is good for me too, but I can see how SQLite would be useful. Could you release both?
I want to make sure I express my love for CSV as well :)

Zach wrote:

csv is good for me too, but I can see how SQLite would be useful. Could you release both?

It seems that CSV is universally understood and creating your own database helps you understand the data better, so it doesn't seem like a SQLite edition would be much of a help, so I'll table the idea for now. 

As the competition goes on, I'll continue to watch to see if there are other improvements that can be done to make initially working with the date more fun :).

CSV is for weenies, SQLite and other database formats are kind of OK, to really kick it up a notch, I advocate custom binary file formats.

May I suggest pipe delimited files with no quotes around text fields. This is the format that would be most friendly to all tools. Bulk inserts struggle when files contain quotes and having a pipe rather than a comma means you wont have to quote strings that contain a comma (as long as the string doesn't contain a pipe!),

I documents some issues I come across here...

Spoken audio on reel-to-reel tapes please. Apache dialect if possible.

CSV is okay. But SQLite would be nice, too. :-)
Stick with CSV files. Feel free to provide different formats as well, but CSV works on all platforms, and converting them is trivial. If people are having difficulties reading CSV, well, let's just say loading the data is the easy part of the competition.
Personally I would have no use for an SQLite file. Give a format that is well documented, and trivial to read even from your own code. CSV is adequate.
Doesn't sqlLite have a very limited number of queries you can make to a database compared to say SQL Express?

Was sqlite data file ever created? It would be nice to have if so. Thanks


Was sqlite data file ever created? It would be nice to have if so. Thanks

No, there was little interest, so I abandoned the idea.


Flag alert Flagging notifies Kaggle that this message is spam, inappropriate, abusive, or violates rules. Do not use flagging to indicate you disagree with an opinion or to hide a post.