IT tidbit: 2018

Tuesday, September 18, 2018

Higher Education IPEDS College Data UI and RESTful API - defnition, charting, demonstration

Updated 04/21/2019:The demonstration of the project can now be found at:here.

* 11/12/2018: Imported newest IPEDS data as of Oct. 24, 2018 into the system. An article that demonstrated the use of these new data can be found at EdPond.blogspot.com.

This article describes my desire, my efforts, and my continuous work on an IPEDS College Data User Interface (UI) and Application Programming Interface API). The application consists of the client, through which user interact to and the server, which implemented via a RESTful API.

The IPEDS college data is, arguably, the most important data for United States' colleges. The data is not only for researcher but also for general public to learn about colleges. The data appeared in various reports like the US News College Ranking and the College Scorecard reports.

The IPEDS college data have been made available for many years, but, like many government data reside on the Internet, a user friendly application is largely not exist and this seriously restrict the value of the data since people can't easily arrange the data into a digestible form. It is these inconveniences that made me wanted to build an infrastructure for, say, the government data.

A system was build around the statistical programming language R, which carries a lot of my pioneering ideas. The system, however, can not be easily extended to the modern web based infrastructure - hence, the current new project.

With the desire to build a new system, I initiated the project with the postsecondary higher education IPEDS college data as the focus. To begin, a fund raising attempt was launched on the Kickstarter.com. With the lack of support from IPEDS participating institutions, I take the work to myself.

As with many long standing data collection, IPEDS college data suffers the same kind of problems like definition changes, backward compatibility ... etc. And as known to the IPEDS college data insider, to resolve all issues came with the survey, it can take tremendous amount of efforts. For this project, we are not aiming to resolve all problems but aiming to provide easier access to the data.

At the onset of the project, two user groups are of our major concern:

Students and parents that are looking for colleges.
The researcher

These two groups will have different scenario for using the college data and likely will require different user interface or application. However, one thing in common, is the need to have all the data in database for easy retrieving.

The first goal of this project is to put the college data through database manipulation. We can then address the accessibility issue. The process of importing college data into the database isn't of no pain since, as we all know, not all data are clean. This process is largely refined, as demonstrated in my first youtube video. The process isn't totally automated, but are good enough for most of the purposes - do prepare to be interrupted when there are problems in the raw data.

For students and parents, our goal is to allow viewing and comparing of multiple colleges to allow making informed decisions.

For researcher, our goal is to provide tools that can help identifying definition changes and help retrieving college data across multiple years.

Once the IPEDS data is processed into the database, it is, then, a matter of how to retrieve the needed information. The design and decision can actually make or break the usefulness of the system.

To entice average user to use the college data system, the interface should provide low learning curve. Even though It is always true that the more you know, the easier you can adapt or make better use of a system. However, with the busy schedule of today's population, lower learning curve is essential for majority of the population.

Before the project started, most of the data accessing logic were tested and implemented in the programming language R with command line console interfaces. The goal of this new project is to provide user with a tested and true graphic user interface for our users.

As the project progressed, video were made to demonstrate the usability. The project won't reach the final stage until the user interface is finalized.

The first report of the progress can be found at UI+API for IPEDS College Data - Definition, Trend, History, - a progress report. The IPEDS college data system demonstrated a search/filter front end that do not require prior knowledge of the IPEDS college data survey. After desired measurement have been selected, the program user interface provide a summary table for user to click and pick allowed refinement for the measurements. The user can then retrieve the college data for viewing as demonstrated in the video, the goal of the video is to show the capability other than the operation as the operation will be refined to provide even better user interface for less tech inclined users.

The second report of the progress can be found at UI+API for IPEDS College Data - Chart, Trend, Definition - a progress report. The college data access system, not only shows the added charting capability but also demonstrates various scenario on how to use the charting capability to exam the college data retrieved. Again, some of the operation will be refined to provide even better user experiences. The chart configuration interface, however, should provide user with a very positive experience in using the college data system. Comparing to the first video, the first video left user with a list of records that have to be processed with other software to get a better sense out of the data. The charting capability of the college data access system removes that needs. The easy of use of the chart configuration table also made the project very user friendly.

Please bear in mind, this is just a seeding project. I have the intention of including more government data based on the similar data processing scheme.

====== vvvvv Scratches vvvvvvv =========
This project is about building the basic infrastructure for general government data with the federal IPEDS college data as the pilot data source.

IPEDS (Integrated Postsecondary Education Data System) survey has been around since 1980's and it collected a lot of data from United States' colleges - public or private. The data is not only interested to researchers but also students and parents.

IPEDS data is made available and is used/reported in many reports - like US News College Ranking and College ScoreCard Report.

Even though the IPEDS data is made available to general public, like most government data reside on the Internet, it usually requires certain level of data processing skills to make good use of it.

The purpose of this project is to reduce the barriers for both the general public and researchers - Please viewing the project video to see what we know is working and how we were able to produce some report from our current system. The goal of this project is to extend the limited data we have and to provide better user interface for our user.

As mentioned in the video, we are targeting two groups of users: The researcher and the general students/parents group.

For students and parents, we will provide tools to allow them to comparing colleges.

For researchers, we would provide tools to help them dealing with multiple year trend data.

This project is a foundation project and it is for the social good. By adding more government data to the system, citizen can get educated and can understand our society better.

Supporting this project basically provide you access to the data and help us to continue funding the development.

Thank for your support.

* All rewards are early-bird 1 year subscription or less. All survey published by IPEDS for recent 10 years are available with summary. Report generated must cite our URL for blame - we make sure our data matches the ultimate source. Personal research can't be cited/used by employer. Personal account results can't be published/shared. Institution account means data at institution level. Sector account means data at the sector level.
Risks and challenges

As mentioned in the project video, for the most part, we have tested our approach with limited data. Through the testing, it demonstrated our ability to solve problem we encountered. Not denying, there will be obstacles, just like we had run into them. Some may be just related to clean-up of data and some may be more technique. But we are confident we can get them resolved. Of cause, we will be needing a dedicate web server to host the data online and to provide data API - one year cost included. If we can get exceeding support, extra money could be used to support the operation.

================
IPEDS (Integrated Postsecondary Education Data System) data is arguably the most important data source for learning about United States' colleges. The data have been made available for years. However, without a user friendly interface/application, the data, like most raw data, is, likely, under-used.

The goal of this project is to build a user-friendly interface/application based on my past experiences in dealing with federal data.

The first video is a buy-in pitch that detailed the vision, the prostpects, and the considerations.

The second, and the following videos are to serve as progress report and, possibly, the tutorial, when the system is made available.

Comments welcome ... SsocialDataCenter at(@) gmail.com
===============
This is a video that is also available at KickStarter project that aims to make IPEDS US college data easier to use and access.

As mentioned in the project, the IPEDS data had been made available for a long time but a user friendly application haven't been readily available. The project is building on a general database scheme that can be extended to other datasets.

For IPEDS, it is intended to provide user an easily accessible online database and a user friendly interface that can retrieve data and general charts.

User support for the project is needed not to just build the system but support the operation cost of keeping the online database on internet servers. Updating the database also cost money so are making improvement to the application.
===============
This video demonstrates an IPEDS access UI. The project is under construction. The project take a very general approach. That means it can easily adopted to other datasets. The project present the data as it is and does not try to make decisions for researchers as how a variable should be interpreted and if it has changed definition through out years. We leave these decisions to data professionals. Aware of this, however, : Not all information are available via IPEDS data files. When in doubt, IPEDS document should be consulted.

Questions and Comments welcome ... SsocialDataCenter at(@) gmail.com
===============
This video presents a progressive enhancement to a IPEDS access UI project, which takes general approach in data organization and, therefore, can adopt to other datasets easily.

Many of the stated objectives:
    The project present the data as it is and does not try
    to make decisions for researchers as how a variable
    should be interpreted and if it has changed definition
    through out the years. We leave these decisions to data
    professionals.
will be demonstrated through the course of this new video.

Beside using the newly designed charting/ploting capability to demonstrate the project's objectives, the video, itself, features an innovative chart configuration tool through which many usage scenario are possible.

With the chart configuration tool, usage scenario were demonstrated that show user ways in detecting anomalies, checking defintions, and locating source of causes.

Aware of this, however, : Not all information are available via IPEDS data files. When in doubt, IPEDS document should be consulted.

If you have any question or comment, please feel free to contact me at: socialdatacenter at(@) gmail.com
========

Wednesday, August 22, 2018

REST or Representational state transfer

After worked on my project using REST frameworks, I realized my implementation of my project could be awkward if I relied heavily on HTTP GET method even though the stateless approach made good sense in the project.

Because of this awkwardness, I begin to look up problems with REST. One of the article I run into is 'RESTful APIs, the big lie'. Few things described in the article resonate with my development experience. After reading comments on the article, it is clear that there are quite few perception/understanding issues about the REST.

To get a bit of clarification I decided to see what Wikipedia have to say about the REST - sorry, I got practical projects to work on and have no intention of spending my time on theoretical debates.

Here's what I get out of the Wikipedia:
The REST is largely what the Web is today (follow the constrains section) -
    Client-Server       - this is obvious.
    Statelessness
      - Except applications that using server side session storage.
      - Cookie is OK since it is client side
Cacheability
- Applications may not always specifically set it. But it is there.
Layered system
- HTTP fulfill it.
    Code on demand (optional)
      - basically, the javascript or others in the early days.
    Uniform interface
      Resource identification in requests
        - This call for identifying of resources which the URL basically fulfill it.
      Resource manipulation through representations
        - A working application not rely on server session will meet this constrains
Self-descriptive messages
- Wiki quote the example of media type - well, the implications are many
- media type is just a code - that means all other info/code can be hard-coded.
        - I guess, we can stretch to say some message standard are to be specified.
Hypermedia as the engine of application state
        - This is like to say a home/root page is desired
        - It serve as the root to discover all other resources.
Applied to Web services
    As should be clear by now, the REST does not call for HTTP.

Most of today's so called RESTFul API/web-service are based on HTTP, but they don't have to. Also, based on HTTP does not make it RESTFul. Since HTTP is not called for, the use of GET/POST/PUT/DELETE should not be a criteria either - even if it is what most of people trying to do - now it make me wonder what role HTTP played in Roy Fielding's dissertation.

For me, I think I am happy with what I have after reading all these. From my assessment of my own project, I would say that what mine don't have at this point is the following:
A home page that can reveal/lead to all resources.
Message standard to be specified/published.
Other than that I think I am fine. Besides, I don't really want to reveal my message standard or provide a home page if I am not interested in making the API public. I may only reveal these to my business partners.

Saturday, July 7, 2018

Why do I Disable Windows Defender on Windows 8.1

Begin

This morning I noticed that my computer is running quite slow. A quick look at the Task Manager I notice the Windows Defender is taking a big share of the resources. I was concerned but do not want to spend time to deal with it and hoping that after some disk scan it will quiet down. Just a quick a note: the CPU was about 25% consumed while the disk is quite high - at times it stay at 100%.

At the evening of the day, I noticed that not much have changed. This really bothered me. Not much of the CPU but the long consumption of the disk for more than 10 hours?

I began Google for ways to stop/disable/uninstall the service. I then learned that you can't really uninstalled the thing - it has been mentioned that it is an integrated part of Windows 8.1 and people take over registry and folders to get rid of it actually find themselves a dead PC. When I did locate few articles about disable the thing, I found that my version of the software do not provide options mentioned in those articles - I suspect that my version of the software must have gone through several upgrades and some of those options must have been removed.

At that point, I was totally frustrated and thinking that 'absolute power corrupts absolutely' - it is not that Microsoft did not mean well. However, when you believe you knows better and do not give user options, there will be times that you miscalculated.

In this case, however, I was fortunately enough to locate the article '3 Ways to Disable Windows Defender on Windows 8/8.1'. The Group Policy method did work and I was able to disable the software.

* A side note. I was trying to work on folder and file permissions in order to stop the software. But failed. The thought? Well, I think it is very interesting in the sense of social behavior. The 'administrator' used to be considered the GOD and is, presumably, given all the power to do things. These days, administrator is no longer trusted and did not, in plain obvious way, given all the power to perform all the things. The administrator these days were guarded against by Microsoft via other layers of security complexity. The registry and the group policy are now the higher layer of security complexity. Without master/tinker with these layer of complexity, administrator's rights are limited.

One big problem with these additional layer of security complexity is that these complexity is not to understand logically - it is a guess and try memorization. It is also subject to change whenever the software update. These settings are totally in the hands of the software. How the software using the registry and group policy value determines if the administrator will have the rights or not. The right of administrator is no longer a given and this added load to administrators' shoulders.

* Question: For the human race, is this approach a step forward?

End

Thursday, April 5, 2018

a Drupal wiki bug solved - flexifilter

Begin

As is known, I began to work on my Always Organizer project located at KickStarter.com.

I began to look into wiki works that have been done for the Drupal platform. Found few good articles that I was not aware of in the past when I was busy working for a not worthing agency - Nebraska Postsecondary.

Anyway. All the thing I learned are going into my wiki - my way of keeping things organized and it is the purpose of my Always Organizer project.

Today, the thing that really made me happy is that I have contributed to the Drupal project - I provided a patch to a Flexifilter issue that have not been resolved for 6 years. See here for detail.

The Flexifilter is one very important component for doing wiki in Drupal. I am glad to contribute and moving my project forward.

End

Wednesday, March 28, 2018

A step toward organizing information in the digital age

As the world moved toward everything digital, people had envision paperless for quite a while now.
To a large degree, for structured data, the world had made great stride. Databases and applications were created successfully to host those data and people everywhere are adopting those systems well.

The observation, however, is that for non-structured data, the adoption is somewhat of a slow processes in the making especially in the realm of organizing and using them.

For example, meeting notes, book notes, and class notes are a kind of un-structured data.

For the note taking industry, there is no shortage of note taking gadgets in the market, but most are just traditional note taking with gadgets.

In terms of organizing and using un-structured data, I would show two articles of interests:
10 Ways to Improve How You Manage Information
Which talked about how 'Information Management is a Hallmark of Better Productivity'
Electronic Lab Notebooks
where University of Cambridge is trying to help researcher to save/store their notes

At the University of Cambridge, they failed to recommend a single software for the University to adopt.

The meat of my suggestion is the use of a wiki-style system to organize notes or similarly un-structured data. Personally, I have used that systems for ages and, although not perfect, I found it is very useful.

Before I presented my wiki-style note system, let's step back and look at how people have been taking notes and how people have been organizing them.

In the cases of book notes or study notes, people/students wrote notes on papers with some kind of structure if they do choose. Once notes were taken, to organizing them, people can easily put related notes together in folders and filed into cabinet if they want by category or alphabetically.

In the cases of meeting notes or notes with similar natures, notes could be taken in similar fashion like the book note or study note, but the filing and organizing can certainly make differences. For example, a meeting may cover various topics/projects. In these cases, note taker may like to tear up the note so that he or she can file notes under the appropriate topic or project. However, if that is the case, this will tear-up the note of the meeting and it would be difficult to know the over all picture of the meeting.

The multi-topic nature of the meeting note isn't as unique as it may sound. For example, to file an article away, reader may notices that article covers two topics, like education and workforce. Filing the article under either the education folder or the workforce folder the reader risks the chance of finding that article the next time when he looking for it since he maybe looking in the wrong folder.

After reviewing few cases of how people dealing with paper notes, let's take a look at what a wiki system can do for us.

First of all, wiki is basically just a quick way to create linked web pages (read Wikipedia if you need to), where web pages are just like a Word document that you can type, you can highlight, and you can format, ...etc. The thing about wiki web pages, however, is that the linking is a second nature of wiki pages and creating a document is as easy as creating
a link.

When applying the wiki principle to the meeting note scenario, we could create, in each project page, a link to the meeting note page. In this way, all project page will aware of decisions made in that meeting while you have your meeting page intact. The other approach would be to cut and paste the meeting decision to each project page while create in the meeting page links that linked to each project page. In either case, you can always find those decisions no matter you follow the meeting page or the project page.

In the case of filing article away, you can easily create, in both the education page and the workforce page, links to the article page. In this way, you can always reach the article page no matter which way/page you choose.

When working on book notes or study notes, link can easily be created to a prerequisite knowledge page and can help the learning tremendously.

Filing or organizing information in wiki is great. But how about the presentation in terms of the 5 hats LATCH approach? In general, user can arrange pages in all five arrangements (Location, Alphabetically, Timeline, Category, and Hierarchically) manually at the same time since all we have are web pages and presentations would just be links to the pages. However, using selected approach for appropriate pages can make the presentation more logical.

Depend on implementation, some wiki system also handle pictures and files.

A retrospect:
Back to the time of folders and short file names, I would build folder hierarchy and try to store/organize files/information in a hierarchy way. With the short file name, I constantly have to create a text file: _ReadMe.txt to define and describe the content of the folder. With the introduction of Windows' 95, the _ReadMe.txt file can actually be a web page that provides links and description of the folder. With the shortcut of the file system, the folder web pages provide a fair information organization system. At the time, I did not try to see if it is possible to link folder pages. But, as you can see, the idea is very close to what I have today with the wiki web pages - Using web page with links to document/organize your information.

A project at Kickstarter.com:
Always-Organizer

Saturday, February 17, 2018

First try on vala

It has been for quite few years that I know about Vala. I don't remember exactly how or why I run into it. I believe I spent a bit of time reading about it and was thinking this is exactly the language I would like to use for all my personal project.

I am not sure that thought will change or not. But here are few things I like about Vala the first time I learn about it:
It is open source.
It is cross-platform ( Windows and Linux).
It supports class in a way similar to C# and Java instead of C++
- even though I like C++'s native look.
It can do GUI in the way of classes - through GUI packages.

For this article, I will just describe the steps I took to finally test the first vala program. I try to follow the instructions I could find. But, somehow, I do run into few problems that bothered me a bit. I did get them resolved but I am not totally confident that is what suppose to happen or if I missed something obvious.

I Started with Vala installation instruction for Windows. But that call for installing the MSYS2 system first. The instruction for installing MSYS2 can be found here. The instruction call for running the command line command: 'pacman -Syuu'. Few articles pointed out that the pacman was a package manage command used by Arch Linux and this lead me to read the man page at here. Unfortunately, reading that does not give me the confidence that the -Syuu is the correct option to use and this lead me to suspect that I probably should be looking for reference information about the pacman command on MSYS2 related websites. At this point, I haven't found appropriate references but I do know that -Syuu seems to lead me to finish test the first Vala program. After finishing with MSYS2, I headed bake to the Vala installation instruction for windows page and installed the following Vala packages in MSYS2:

pacman -S mingw-w64-x86_64-gcc 
pacman -S mingw-w64-x86_64-pkg-config
pacman -S mingw-w64-x86_64-vala

I then headed to this page to test the first Vala program.

After creating the program file, I issued the Valac command to compile the program and the shell returned that Valac is not recognized as a command. This give the hint that this maybe caused by the path variable in MSYS2 shell environment. Checking the syntax for setting path in Linux, I entered the following command:
$ export PATH=$PATH:PathToMSys2/mingw64/bin/
and that solved the problem and a .exe file was generated.

To test the .exe file, I started a DOS window and typed in
>my_first_program
which failed with a message of missing .dll file.

After modify the DOS path with this command:
path=PathToMSys2\mingw64\bin;%path
I was able to run the .exe file successfully.

* Since I uses the zipped file instead of the installer file to install the MSYS2, It seems reasonable that the path info would not presented in MSYS2 or DOS even though I do need to run the package management program, which could detect the path and setting the path info for both MSYS2 and DOS environment.

End

IT tidbit