Thursday, April 5, 2018

a Drupal wiki bug solved - flexifilter

Begin

As is known, I began to work on my Always Organizer project located at KickStarter.com.

I began to look into wiki works that have been done for the Drupal platform. Found few good articles that I was not aware of in the past when I was busy working for a not worthing agency - Nebraska Postsecondary.

Anyway. All the thing I learned are going into my wiki - my way of keeping things organized and it is the purpose of my Always Organizer project.

Today, the thing that really made me happy is that I have contributed to the Drupal project - I provided a patch to a Flexifilter issue that have not been resolved for 6 years. See here for detail.

The Flexifilter is one very important component for doing wiki in Drupal. I am glad to contribute and moving my project forward.

End

Wednesday, March 28, 2018

A step toward organizing information in the digital age


As the world moved toward everything digital, people had envision paperless for quite a while now.
To a large degree, for structured data, the world had made great stride. Databases and applications were created successfully to host those data and people everywhere are adopting those systems well.

The observation, however, is that for non-structured data, the adoption is somewhat of a slow processes in the making especially in the realm of organizing and using them.

For example, meeting notes, book notes, and class notes are a kind of un-structured data.


For the note taking industry, there is no shortage of note taking gadgets in the market, but most are just traditional note taking with gadgets.

In terms of organizing and using un-structured data, I would show two articles of interests:
  10 Ways to Improve How You Manage Information
    Which talked about how 'Information Management is a Hallmark of Better Productivity'
  Electronic Lab Notebooks
    where University of Cambridge is trying to help researcher to save/store their notes


At the University of Cambridge, they failed to recommend a single software for the University to adopt.

The meat of my suggestion is the use of a wiki-style system to organize notes or similarly un-structured data. Personally, I have used that systems for ages and, although not perfect, I found it is very useful.

Before I presented my wiki-style note system, let's step back and look at how people have been taking notes and how people have been organizing them.

In the cases of book notes or study notes, people/students wrote notes on papers with some kind of structure if they do choose. Once notes were taken, to organizing them, people can easily put related notes together in folders and filed into cabinet if they want by category or alphabetically.

In the cases of meeting notes or notes with similar natures, notes could be taken in similar fashion like the book note or study note, but the filing and organizing can certainly make differences. For example, a meeting may cover various topics/projects. In these cases, note taker may like to tear up the note so that he or she can file notes under the appropriate topic or project. However, if that is the case, this will tear-up the note of the meeting and it would be difficult to know the over all picture of the meeting.

The multi-topic nature of the meeting note isn't as unique as it may sound. For example, to file an article away, reader may notices that article covers two topics, like education and workforce. Filing the article under either the education folder or the workforce folder the reader risks the chance of finding that article the next time when he looking for it since he maybe looking in the wrong folder.


After reviewing few cases of how people dealing with paper notes, let's take a look at what a wiki system can do for us.

First of all, wiki is basically just a quick way to create linked web pages (read Wikipedia if you need to), where web pages are just like a Word document that you can type, you can highlight, and you can format, ...etc. The thing about wiki web pages, however, is that the linking is a second nature of wiki pages and creating a document is as easy as creating
a link.


When applying the wiki principle to the meeting note scenario, we could create, in each project page, a link to the meeting note page. In this way, all project page will aware of decisions made in that meeting while you have your meeting page intact. The other approach would be to cut and paste the meeting decision to each project page while create in the meeting page links that linked to each project page. In either case, you can always find those decisions no matter you follow the meeting page or the project page.

In the case of filing article away, you can easily create, in both the education page and the workforce page, links to the article page. In this way, you can always reach the article page no matter which way/page you choose.

When working on book notes or study notes, link can easily be created to a prerequisite knowledge page and can help the learning tremendously.

Filing or organizing information in wiki is great. But how about the presentation in terms of the 5 hats LATCH approach? In general, user can arrange pages in all five arrangements (Location, Alphabetically, Timeline, Category, and Hierarchically) manually at the same time since all we have are web pages and presentations would just be links to the pages. However, using selected approach for appropriate pages can make the presentation more logical.

Depend on implementation, some wiki system also handle pictures and files.


A retrospect:
Back to the time of folders and short file names, I would build folder hierarchy and try to store/organize files/information in a hierarchy way. With the short file name, I constantly have to create a text file: _ReadMe.txt to define and describe the content of the folder. With the introduction of Windows' 95, the _ReadMe.txt file can actually be a web page that provides links and description of the folder. With the shortcut of the file system, the folder web pages provide a fair information organization system. At the time, I did not try to see if it is possible to link folder pages. But, as you can see, the idea is very close to what I have today with the wiki web pages - Using web page with links to document/organize your information.

A project at Kickstarter.com:
Always-Organizer


Saturday, February 17, 2018

First try on vala


It has been for quite few years that I know about Vala. I don't remember exactly how or why I run into it. I believe I spent a bit of time reading about it and was thinking this is exactly the language I would like to use for all my personal project.

I am not sure that thought will change or not. But here are few things I like about Vala the first time I learn about it:
  It is open source.
  It is cross-platform ( Windows and Linux). 
  It supports class in a way similar to C# and Java instead of C++
    - even though I like C++'s native look.
  It can do GUI in the way of classes - through GUI packages.

For this article, I will just describe the steps I took to finally test the first vala program. I try to follow the instructions I could find. But, somehow, I do run into few problems that bothered me a bit. I did get them resolved but I am not totally confident that is what suppose to happen or if I missed something obvious.

I Started with Vala installation instruction for Windows. But that call for installing the MSYS2 system first. The instruction for installing MSYS2 can be found here. The instruction call for running the command line command: 'pacman -Syuu'. Few articles pointed out that the pacman was a package manage command used by Arch Linux and this lead me to read the man page at here. Unfortunately, reading that does not give me the confidence that the -Syuu is the correct option to use and this lead me to suspect that I probably should be looking for reference information about the pacman command on MSYS2 related websites. At this point, I haven't found appropriate references but I do know that -Syuu seems to lead me to finish test the first Vala program. After finishing with MSYS2, I headed bake to the Vala installation instruction for windows page and installed the following Vala packages in MSYS2:

pacman -S mingw-w64-x86_64-gcc 
pacman -S mingw-w64-x86_64-pkg-config
pacman -S mingw-w64-x86_64-vala

I then headed to this page to test the first Vala program.

After creating the program file, I issued the Valac command to compile the program and the shell returned that Valac is not recognized as a command. This give the hint that this maybe caused by the path variable in MSYS2 shell environment. Checking the syntax for setting path in Linux, I entered the following command:
    $ export PATH=$PATH:PathToMSys2/mingw64/bin/
and that solved the problem and a .exe file was generated.

To test the .exe file, I started a DOS window and typed in
>my_first_program
which failed with a message of missing .dll file.

After modify the DOS path with this command:
path=PathToMSys2\mingw64\bin;%path
I was able to run the .exe file successfully.

* Since I uses the zipped file instead of the installer file to install the MSYS2, It seems reasonable that the path info would not presented in MSYS2 or DOS even though I do need to run the package management program, which could detect the path and setting the path info for both MSYS2 and DOS environment.


End


Monday, October 30, 2017

Sweet and short R set-diff code


Was working on a project in R and was using extensively with the 'names' property of a vector. Basically, I use the vector to form the menu for user to pick off the desired choice. To provide user enough information, lines and lines information is embedded in the names property.

All of these was working well until I let user to pick items to be removed from the list/vector repeatedly.

As is a reasonable approach, all the program need to do is to use R's set operation: setdiff() to remove picked items from the original vector/list. Unfortunately, the setdiff() function removes all names from the list.

Here's my solution: v1 <- v1[!is.element(v1,Picked_items)];

In this way, all names are retained.

Don't you just love R - no loop is needed.

Wednesday, May 17, 2017

Unix sed, Python CsvKit under Windows Scripting Host (WSH); Problems and solutions


This is a just a short article that described problem and finding I encountered in one of my project and were wishing that what I found can be useful to someone following a similar path as me - A short side track: Nerds or people like me working and sharing knowledge, a lot of times, were overlooked as been anti-social and less of charity or volunteering to the society. I for one, would like to pass and dispatch the messages that we, the nerd and genuine hard worker, are to be proud of our contribution to the society and world for making the world a better place.

Back to the topic.

I were involved in a Windows automation project and were writing most of my code in VBA. However, with my knowledge about the Unix way of doing things, it makes totally sense for me wanting to run some tasks through some Unix utilities program. As we all know, Unix way basically means command lines. Fortunately, Windows did not fore go the access to command lines. For VBA, there are the general Shell() command. But that is not the only option for VBA. With Microsoft COM infrastructure, VBA programmer has access to wide variety of objects. One of them is the Windows Scripting Host. By accessing to Windows Scripting Host, programmer can have better control of the DOS-shell/command-line environment.

While I was happily using the Windows Scripting Host to carry out my command line tasks, I notice that for most Windows/DOS based programs they all run great until I try to run some Unix utilities that were ported over to the Windows/DOS world.

Before I go on, I would like to point out that I did not spend a lot of time trying to figure out every single issues, so please bear with me for not be able to provide all possible solutions and explanations

One Unix utility I used is the sed command from the MinGW/msys. I was able to verify its functionality by running some test directly under the Windows/DOS command line after removing some conflicting search paths from the PATH environment variables - for example, in my system, I also have Qt and GNAT installed.

After verifying the sed under that Windows/DOS command line, I invoked it through the Windows Scripting Host via VBA. The command failed with a return code of 2, which, for sed, could just mean errors during execution or , for DOS, could mean file not found. By testing with non-existing command, we know the Windows Scripting Host do recognize the sed command. By testing with simple 'sed -help 2>file', I realize it may have something to do with the shall's interpretation/parsing of command line. This lead me to the thought of using the Cmd.exe /C to run the sed. By running 'Cmd.exe /C sed ...  ' under the Windows Scripting Host, everything worked out. The other command I run into the same problem is the Python CsvKit commands. Again, running with Cmd.exe solve the problem. At this point, I can't say I understand the problem. My hint is that it may have something to do with the redirection of the standard output since all my Unix and Python commands used the redirection.



Monday, March 20, 2017

A better replication function:rep() for R

This article is to propose a new approach to the R replication function rep(). At this point, I am not fluent in creating R packages and would not, for a while, create the package even though I will try to provide the code I have in mind.

First of all, let's review the current implementation of the R replication function rep() in my own words:
  rep(Vctr,times=Vctr1,each=n,length.out=N)
Each element in Vctr is repeat n times if Vctr1 do not exist. Otherwise, the times each element is repeated is specified by Vctr1. However, if times is a number, the Vctr is repeated that number of times. length.out disregard times, each element is repeated n times and repeated again until length.out is reached. 
From my description above, I see that when 'times' is a vector, its elements controls how the corresponding element in Vctr are repeated. But not when 'times' is degraded to a number - it then control the number of times the 'Vctr' is repeated. On the other hand, the 'each', as a number, it also controls the number of times each element in Vctr are repeated. With these info, it just logical for me to want to reconsider the situation when 'times' is just a number. It just seems more logical to me to consider this as a special case where all elements in Vctr are to be repeated the same amount of 'times'. i.e.
    Vctr1= 5 := c(5, 5, 5, ...).
With this equivalency, I also like to propose the switch the meaning of  'times' and 'each' so that the 'each' now describes how many times each of the elements in Vctr should be repeated. With the meaning of the new 'each' been settled, we should now reconsider the meaning of the new 'times'.

The newly proposed meaning for 'times' will be the times to repeat the sequence generated by the 'each'. The new meaning for length.out would be designated as the maximum length of the eventual output.

With the proposed changes outlined above, here are few examples to demonstrate the new new-rep() function.

>new-rep(1:5,each=2,times=1,length.out=13)
[1] 1 1 2 2 3 3 4 4 5 5

>new-rep(1:5,each=2,times=2,length.out=13)
[1] 1 1 2 2 3 3 4 4 5 5 1 1 2

>new-rep(1:5,each=c(1,2,3,2,1),times=1,length.out=13)
[1] 1 2 2 3 3 3 4 4 5

>new-rep(1:5,each=c(1,2,3,2,1),times=2,length.out=13)
[1] 1 2 2 3 3 3 4 4 5 1 2 2 3

Can we create all that can be done with the old rep() function? - yes!
Can we create something that is not possible to create with the old rep() function? - yes

Possible algorithm:
new-rep <- function (Vctr,each,times,length.out) {
    if length(each)==1 { # if each exist and just a number
        each <- rep(each,times=length(Vctr))  # expand each
    }
    Rslt <- rep(Vctr, times=each);
    Rslt <- rep(Rslt, times=times);
    Lngth <- length(Rslt);
    if (is.numeric(length.out))  {
     if (length.out&tl;Lngth) Lngth <- length.out;
    }
    Rslt[1:Lngth];
}
Appreciate your thoughts and possible creation of package.

Wednesday, March 30, 2016

Labels for ACS Pums csv download - Data Ferrett to the rescure


US Census Bureau conducts the American Community Survey (ACS) annually. One of the data product that made available to the public is the Public Use Microdata Sample (PUMS) data, that can be used by researchers to derive statistical results.

The PUMS data were made available in two formats: SAS and CSV file format. The SAS file format is a proprietary file format and contains long text that give meaning to shortened mnemonic for variables and categorical values. The CSV file format, on the other hand, is  supported by almost all software but does not contain helping texts. US Census Bureau does provide document in pdf or text format that describes the mnemonics used. However, the pdf and the text format isn't the easiest to use for statistical software.

Personally, I believe the XML format is ideal to store the description info for mnemonics and I have suggested this approach to ACS help group. In the mean time, I discovered that the Data Ferrett application provided by the Census Bureau can serve as an alternative source for the XML file. By selecting all variables using the Data Ferrett and save the session to a local file, you obtained an XML file that contains the descriptions and the mnemonics.

By using XSLT, people can easily construct statements that, when run by statistical software, will assign descriptions to mnemonics.

* A side note, at this point, the XML file generated by the Data Ferrett isn't perfect. The & sing isn't correctly coded accounting to XML standard and some XSLT software may complain about the file. This, however, can easily corrected by search and replace the & sign.