Tuesday, September 18, 2018

Higher Education IPEDS College Data UI and RESTful API - defnition, charting, demonstration


Updated 04/21/2019:The demonstration of the project can now be found at:here.

* 11/12/2018: Imported newest IPEDS data as of Oct. 24, 2018 into the system. An article that demonstrated the use of these new data can be found at EdPond.blogspot.com.

This article describes my desire, my efforts, and my continuous work on an IPEDS College Data User Interface (UI) and Application Programming Interface API). The application consists of the client, through which user interact to and the server, which implemented via a RESTful API.


The IPEDS college data is, arguably, the most important data for United States' colleges. The data is not only for researcher but also for general public to learn about colleges. The data appeared in various reports like the US News College Ranking and the College Scorecard reports.

The IPEDS college data have been made available for many years, but, like many government data reside on the Internet, a user friendly application is largely not exist and this seriously restrict the value of the data since people can't easily arrange the data into a digestible form. It is these inconveniences that made me wanted to build an infrastructure for, say, the government data.


A system was build around the statistical programming language R, which carries a lot of my pioneering ideas. The system, however, can not be easily extended to the modern web based infrastructure - hence, the current new project.

With the desire to build a new system, I initiated the project with the postsecondary higher education IPEDS college data as the focus. To begin, a fund raising attempt was launched on the Kickstarter.com. With the lack of support from IPEDS participating institutions, I take the work to myself.

As with many long standing data collection, IPEDS college data suffers the same kind of problems like definition changes, backward compatibility ... etc. And as known to the IPEDS college data insider, to resolve all issues came with the survey, it can take tremendous amount of efforts. For this project, we are not aiming to resolve all problems but aiming to provide easier access to the data.
 
At the onset of the project, two user groups are of our major concern:
  1.  Students and parents that are looking for colleges.
  2.  The researcher
These two groups will have different scenario for using the college data and likely will require different user interface or application. However, one thing in common, is the need to have all the data in database for easy retrieving.

The first goal of this project is to put the college data through database manipulation. We can then address the accessibility issue. The process of importing college data into the database isn't of no pain since, as we all know, not all data are clean. This process is largely refined, as demonstrated in my first youtube video. The process isn't totally automated, but are good enough for most of the purposes - do prepare to be interrupted when there are problems in the raw data.


For students and parents, our goal is to allow viewing and comparing of multiple colleges to allow making informed decisions.

For researcher, our goal is to provide tools that can help identifying definition changes and help retrieving college data across multiple years.


Once the IPEDS data is processed into the database, it is, then, a matter of how to retrieve the needed information. The design and decision can actually make or break the usefulness of the system.

To entice average user to use the college data system, the interface should provide low learning curve. Even though It is always true that the more you know, the easier you can adapt or make better use of a system. However, with the busy schedule of today's population, lower learning curve is essential for majority of the population.


Before the project started, most of the data accessing logic were tested and implemented in the programming language R with command line console interfaces. The goal of this new project is to provide user with a tested and true graphic user interface for our users.

As the project progressed, video were made to demonstrate the usability. The project won't reach the final stage until the user interface is finalized.

The first report of the progress can be found at UI+API for IPEDS College Data - Definition, Trend, History, - a progress report. The IPEDS college data system demonstrated a search/filter front end that do not require prior knowledge of the IPEDS college data survey. After desired measurement have been selected, the program user interface provide a summary table for user to click and pick allowed refinement for the measurements. The user can then retrieve the college data for viewing as demonstrated in the video, the goal of the video is to show the capability other than the operation as the operation will be refined to provide even better user interface for less tech inclined users.

The second report of the progress can be found at UI+API for IPEDS College Data - Chart, Trend, Definition - a progress report. The college data access system, not only shows the added charting capability but also demonstrates various scenario on how to use the charting capability to exam the college data retrieved. Again, some of the operation will be refined to provide even better user experiences. The chart configuration interface, however, should provide user with a very positive experience in using the college data system. Comparing to the first video, the first video left user with a list of records that have to be processed with other software to get a better sense out of the data. The charting capability of the college data access system removes that needs. The easy of use of the chart configuration table also made the project very user friendly.

Please bear in mind, this is just a seeding project. I have the intention of including more government data based on the similar data processing scheme.



====== vvvvv Scratches vvvvvvv =========

This project is about building the basic infrastructure for general government data with the federal IPEDS college data as the pilot data source.

IPEDS (Integrated Postsecondary Education Data System) survey has been around since 1980's and it collected a lot of data from United States' colleges - public or private. The data is not only interested to researchers but also students and parents.

IPEDS data is made available and is used/reported in many reports - like US News College Ranking and College ScoreCard Report.

Even though the IPEDS data is made available to general public, like most government data reside on the Internet, it usually requires certain level of data processing skills to make good use of it.

The purpose of this project is to reduce the barriers for both the general public and researchers - Please viewing the project video to see what we know is working and how we were able to produce some report from our current system. The goal of this project is to extend the limited data we have and to provide better user interface for our user.

As mentioned in the video, we are targeting two groups of users: The researcher and the general students/parents group.

For students and parents, we will provide tools to allow them to comparing colleges.

For researchers, we would provide tools to help them dealing with multiple year trend data.

This project is a foundation project and it is for the social good. By adding more government data to the system, citizen can get educated and can understand our society better.

Supporting this project basically provide you access to the data and help us to continue funding the development.

Thank for your support.

* All rewards are early-bird 1 year subscription or less. All survey published by IPEDS for recent 10 years are available with summary. Report generated must cite our URL for blame - we make sure our data matches the ultimate source. Personal research can't be cited/used by employer. Personal account results can't be published/shared. Institution account means data at institution level. Sector account means data at the sector level.
Risks and challenges

As mentioned in the project video, for the most part, we have tested our approach with limited data. Through the testing, it demonstrated our ability to solve problem we encountered. Not denying, there will be obstacles, just like we had run into them. Some may be just related to clean-up of data and some may be more technique. But we are confident we can get them resolved. Of cause, we will be needing a dedicate web server to host the data online and to provide data API - one year cost included. If we can get exceeding support, extra money could be used to support the operation.


================
IPEDS (Integrated Postsecondary Education Data System) data is arguably the most important data source for learning about United States' colleges. The data have been made available for years. However, without a user friendly interface/application, the data, like most raw data, is, likely, under-used.

The goal of this project is to build a user-friendly interface/application based on my past experiences in dealing with federal data.

The first video is a buy-in pitch that detailed the vision, the prostpects, and the considerations.

The second, and the following videos are to serve as progress report and, possibly, the tutorial, when the system is made available.

Comments welcome ... SsocialDataCenter at(@) gmail.com

===============
This is a video that is also available at KickStarter project that aims to make IPEDS US college data easier to use and access.

As mentioned in the project, the IPEDS data had been made available for a long time but a user friendly application haven't been readily available. The project is building on a general database scheme that can be extended to other datasets.

For IPEDS, it is intended to provide user an easily accessible online database and a user friendly interface that can retrieve data and general charts.

User support for the project is needed not to just build the system but support the operation cost of keeping the online database on internet servers. Updating the database also cost money so are making improvement to the application.

===============
This video demonstrates an IPEDS access UI. The project is under construction. The project take a very general approach. That means it can easily adopted to other datasets. The project present the data as it is and does not try to make decisions for researchers as how a variable should be interpreted and if it has changed definition through out years. We leave these decisions to data professionals. Aware of this, however, :  Not all information are available via IPEDS data files. When in doubt, IPEDS document should be consulted.

Questions and Comments welcome ... SsocialDataCenter at(@) gmail.com

===============
This video presents a progressive enhancement to a IPEDS access UI project, which takes general approach in data organization and, therefore, can adopt to other datasets easily.

Many of the stated objectives:
    The project present the data as it is and does not try
    to make decisions for researchers as how a variable
    should be interpreted and if it has changed definition
    through out the years. We leave these decisions to data
    professionals.
will be demonstrated through the course of this new video.

Beside using the newly designed charting/ploting capability to demonstrate the project's objectives, the video, itself, features an innovative chart configuration tool through which many usage scenario are possible.

With the chart configuration tool, usage scenario were demonstrated that show user ways in detecting anomalies, checking defintions, and locating source of causes.

Aware of this, however, :  Not all information are available via IPEDS data files. When in doubt, IPEDS document should be consulted.

If you have any question or comment, please feel free to contact me at:  socialdatacenter at(@) gmail.com

========