Can we resurrect product-open-data.com?


#1

The product-open-data.com web site is down. I’ve been told (by Matt Moore) that:

“Basically it’s a php based website, with a mysql backend. It’s been written from scratch by someone, who’s not documented it or even put it under version control. If you’re a good php programmer and are happy to wrangle someone else’s code, then you’re welcome to help out.”

I’d like to take a look at the code and see how much work it would require to get the site back up and the product data back online.


#2

Great to hear from you and let’s get this back up.

First, some context: the site was set up by Phillipe Piagnol who then transferred it to Open Knowledge and specifically Open Knowledge Labs about a year ago.

Resources:

For my part, I will work to find out which server this is running on and to get you access to that.


#3

Hi. I’ve been in touch with swmerrill and Matt Moore about this. I too am willing to offer any assistance I can to get this site back up and running. I come from more of a sys admin background than a coder, so perhaps that nicely compliments swmerrill’s skills.


#4

@mike and @swmerrill this is great. Please take a look at the docs and see what makes sense and please raise any queries. I note I and colleagues may be away from keyboard for next few days so apologies if you don’t get a swift response and hope to get on getting you folks access to start poking around once I and/or colleagues i’m back on the internets end of this week!


#5

I’ve had a quick look at the code. I think the best thing to do is to stick it on a webserver and see if we can get it to run. Matt mentioned that it’s available on one of the lab servers. Is it possible to get an account on this server to try setting it up? If not, is it possible to get a dump of the database, so that I can try running it on one of my test machines? Again, Matt mentioned that he thought that there was a copy of the database on the labs server.


#6

@mike @swmerrill this is great and we will double-check with relevant sysadmins both about where current code is running and getting you somewhere on a Labs machine to play around.

/cc @mikechelen


#7

@mike - are you ready to go? If so can you open an issue here https://github.com/okfn/okfn.github.com/issues adding your public key and the outline of what you think you’ll need.

PS: if you are anywhere near London check out http://attending.io/events/open-data-maker-london-feb-2015 next week


#8

@mike @swmerrill how’s this going? Let me know anything Labs folks can do to support.


#9

I downloaded the 100 MB zip file from GitHub to look at the code and I browsed the other project files there, too. The code is poorly documented and I immediately ran into a small bug that kept it from running. There’s also some (probably) useless Python code connected to Django. But the main thing was that the database was not included. There are three large Excel files with manufacturer data but I would have expected a mysqldump file with the product, manufacturer, and other data that we could have used to rebuild a MySQL database.

Last night, I rediscovered a Google doc written by Philippe, the original developer, at https://docs.google.com/document/d/1Y8idf4Boc4ypc3SHyAC1qHCzbYJAtdEUX6uDih7OLuA/edit#heading=h.55vdpk1my9q. It contains a link to a separate location for downloading the data. Clicking it takes you to some other OKFN site but not to the data.

Is there any way to get a copy of the data in order to see whether I can reconstruct the database? Or did the Labs put up a server with it so we can test the code and see what needs to be done?


#10
  • Yes - not that well documented by its original author (who was not a coder i think)
  • Believe you are right that django code can be dumped
  • mysqldump won’t be in github but will be on the server

I believe it takes you to the broken product data website (what we are trying to fix).

Re getting the data (and the server), as we are getting into more proper sysadmin can I ask you to open an issue at: https://github.com/okfn/okfn.github.com/issues (that’s the best place for tracking this and its preferred location for labs sysadmins … - apologies for asking for a new thread though this can just be about getting the data dump)


#11

I opened an issue on github to have the data made available as requested.


#12

Here’s where we are at so far.

  • Daniel Fowler made the mysqldump file available.
  • I downloaded it, gunzipped it, created a mysql database named web_pod, and used the mysql source command to read in the data. No errors reported.
  • I created an apache web site on my personal machine and unzipped the files from github into it.
  • I changed the discover/secret/connexion.php file to use the mysql connection data for the site.
  • I changed the discover/index.php file to point to the appropriate directories on localhost.
  • I changed the <? tag in relevant PHP files to <?php to be compatible with modern PHP.
  • I pointed my browser to the site and the product-code home page came up fine.
  • The navigation links to okfn.org sites work fine.

Next step:

  • The Home, Browse Data, Data Quality, Download Data, and Search links all point to locations (directories) that do NOT exist in the github version of the program (unless I’m missing something). We need to find the remainder of the code (for directories en, navigate, data-quality, download, and search) and add it.

#13

ok, maybe as a first step we can just get you ssh access to actual server and you see if your simple patch fixes things there. that will at least get site back up and we can then work on everything else.

@danfowler are you happy to take this forward between you and @swmerrill


#14

Yup. I will take it forward with @swmerrill. Still traveling so limited by time/Internet for the next few days.


#15

A couple of notes while we’re waiting for Daniel’s return.

  1. The table with the gtin’s (global trade item numbers, aka barcode numbers) is gtin. The barcodes or gtin’s themselves are in a field called gtin_cd and the descriptions or product names are in a field called gtin_nm. So a simple SQL query such as “SELECT gtin_cd, gtin_nm FROM gtin;” will pull out all 921805 entries. (Best, of course, to include a LIMIT or WHERE clause. :))

  2. Was there a plan in place to update the data as new gtin’s are issued? I wonder how the original entries were collected. Does the OFKN have a GEPIR account with GS1? (This is not urgent, but I can’t help wondering how long the database will be useful if we don’t or cannot update more than 20 or 30 entries a day, so I thought I’d raise the issue.)


#16

OK, back online. The target server needs a little attention today, after which I can begin prepping it to host http://product-open-data.com/ properly (it had never actually been running on this server).

@swmerrill if you would like access to the server, you can email the general OKFN sysadmins (sysadmin@okfn.org) for access referencing this page/issue and requesting membership in the labs_dev group. Of course, I’m quite happy to do the configuration work to get your working code up. At any rate, perhaps the next step is to push your local changes to GitHub so we can pull down the working code to the production server.


#17

@swmerrill I think the “Home”, “Browse Data”, etc. links are meant to be handled by the redirects listed in its original (?) nginx configuration file


#18

OK, having solved some of the more pressing issues on the server, I was able to install the prereqs for the site and boot it back up. Have a look: http://product-open-data.com/


#19

Congratulations, Dan! The site looks good. I haven’t tried all the download links, yet, but the ones I clicked worked well.

We could stand to make it even more clear that codes need to be 13 characters long in all cases or else modify the code to accept shorter ones. Also, the data need to be updated. But the site appears to be back to where it was at the end of 2013.

I think it’s going to be a big help.


#20

There is an issue with the Android (and possibly the iOS) app. It gives the error message “Sorry, this GTIN code is not in the database” regardless of what code is entered or scanned, including those that are found when using a browser. The app appears to prepend a zero to GTIN’s that are only 12 characters long so that is likely not the problem.