Core Data Curators - Introductions


#1

We invite you to introduce yourself here. Suggested items to cover - you can add whatever else you like:

  • Your name
  • Your background and current area of work
  • Your skills
  • Why you are interested in contributing

#2

Hello.

I’m Edafe and I’m a data geek. I help organisations get the most out of their data, without the tech speak.

I’m a consultant, trainer and data tamer. My specialty is data strategy but I love data in all forms, shapes and sizes. I also believe in “knowledge for everyone” and the power of collaboration.

That’s why I’m here.

I’ve worked with data for 18 years, mainly in database architecture, modelling and development. My SQL skills are sharp, I can Excel with the best of them and I’m picking up python and R by the day.

That’s me in a nutshell!


#3

I’m Phil and I’m a software developer.

I’ve interests in linked and open data, the semantic web, and open source software. I believe that open data is empowering for the general public and this make it very important, I feel the same way about open source software. Linked data and the semantic web are important to me as I see them as a natural evolution for the world wide web and as such are great tools for change.

I’ve a growing interest in the ethics of IT and Computer Science, and in data protection for private individuals.

Technical bit:
I worked with several programming languages in the last 20 years. My languages of choice are Python and C#, also Java when scraping. I also know SQL, SPARQL, CSV, JSON, XML, RDF.

I have experience scraping web pages, MS Office documents, PDF etc. I’ve worked with Excel and Open Refine. All useful stuff in the open data world.


#4

Hi ,

My name is Sandeep and i work as a data analyst in an Oil and Gas firm located in Houston by the day. I am pretty good with excel, mostly work with SQL and have programmed a lot in R. I have a lot of experience using different GIS tools and dealing with spatial data. I am constantly trying to learn new data visualization, analysis techniques when time permits.

I am interested in open data because i have realized its power in changing the world (no exaggeration) and am a bit disappointed that i could not take off some time to work with previous open data projects. I hope to contribute my skills and learn a lot from like minded individuals working on some of this stuff.

Thanks


#5

Hi there, my name is David.

I am a mathematician and have been working as a software developer for 4 years, doing mostly analysis and prediction based on log files. I work with Hadoop, mostly with PigLatin and naturally Java, but also know my way around in R and bash.

I think “open” as default would make us a better society and love raging about closed and proprietary stuff. So I welcome this opportunity to eventually contribute something, as well as getting to know new interesting people and techniques.


#6

Hi @hirntodt @sandeeph2o @EvilPhil @ekoner - great to have your introductions and to learn more about your interests and skills.

Let me know how you are getting on with the Getting Started Guide and, in particular, if you have thoughts on which dataset you’d be most interested in packaging first.


#7

Hi @rufuspollock, I’ve had a quick skim and will be delving into it this weekend.

Anyone up for a Skype / Google hangouts session this weekend to swap notes and tips?

Cheers

Edafe


#8

@ekoner that’s a great idea - and the next thing I’d been planning to propose was a hangout or skype session. Do you have any specific times that work for you this weekend?


#9

Hi @rufuspollock, I’m available at 18:00 GMT today 10.01.15 or 13:00 GMT tomorrow.


#10

Hi everyone, i went through the guide, i think i can work on nominating as well as cleaning (as long as it does not require complicated regex or something else) datasets. My understanding is that we source this data from different existing sources like World bank, wikipedia , data.gov like websites as well as other more unstructured sources but i am not sure how the vetting process will be. Nonetheless this looks interesting and i am eager to work on it.


#11

@sandeeph2o that all sounds good and you are right about approach - generally we want the most authoritative source we can find (modulo licensing - i.e. we want to have open data, or as close to open data as we can get).


#12

@ekoner sorry to miss you today and I unfortunately can’t make that time tomorrow - I’d also like to schedule a bit more so others can join. In the mean time, do you want to take a look at the instructions and post any questions or queries you have?


#13

I’ve made the first comment in Watercooler (about our first new packaged dataset) Watercooler - Core Datasets


#14

Hi @rufuspollock, I started going through the guide by working on the language code package. First I was a bit confused when only reading it, bu so far I think it’s straight forward while working with it (almost done preparing the package, getting it to github soon).

My question right now would be where to discuss data set specific things, in the github issue or somewhere here?

Specifically, I could not find any license information for that data set…


#15

Hi @rufuspollock

I’m happy to curate a dataset and am looking at #10, IBAN / BIC codes (SWIFT) right now. Also happy to create a dataset and am researching Irish Property Prices, I’m @PilipMac on github where we’ve already discussed this (Pilip is the gaelic version of my name). The research has been a bit slow to start but I’ve a clear week next week.


#16

Hi everyone,

I’m Fred and I currently work as a data analyst in Paris.

I’m quite comfortable with MATLAB and Excel, and now I’m learning Python.

This project inspire me because I think it’s a good occasion to make the world just a little bit better. I am also glad to meet people with shared interest and I hope to learn a lot of stuff here !


#17

Hi everyone! I’m Evan and I work at UNICEF’s Global Innovation Centre exploring new ways to extend UNICEF’s impact for children around the world. My background is in software development (mostly python and js) and I’ve dealt with plenty of data along the way.

I’ve been maintaining the country codes dataset (http://data.okfn.org/data/core/country-codes) and would be happy to take on some more datasets and tasks.


#18

Hi all! I’m Carmen, doing data management outreach and education at a research university in Chicago. I’m relatively new to the field. I work with a lot of social/physical/health sciences researchers, though I have a special interest in curating data of interest to humanists (I have more of a digital humanities background). I don’t always get to work directly with datasets in my day job, so I’m excited to get my hands dirty.

I’ve read through the guide and I think I understand the process…looking forward to getting started and working with everyone!


#19

Hi everyone! I am Srinivas Kodali. I am a civil engineer working on Intelligent Transportation Systems in India. I build apps, websites,visualizations and am good with most data analysis languages and tools. My major interests are in GIS and Transportation, Cities. I am an active member of India`s open data community.

I am currently working on curating GTFS datasets of transit agencies in India and making them open. This data can help in solving lot of transport issues in the country and it doesn`t exist in a standard form unlike western countries. Here is the link for the project https://github.com/transitmetrics/ntd

I am good with data crawling, cleaning, structuring from websites and even pdf`s. Looking forward to work with you all.


#20

@seiteta @evmw @ccas @srinivaskodali welcome all - and great to have you involved.

Take a look at Getting Started Guide if you haven’t already and if you have any queries or questions (or just want to let folks know how you are getting on) you can drop a message in the Watercooler channel.