2014년 9월 28일 일요일

Piranha: Optimizing Short Jobs in Hadoop 0929


게시물은 Phil's Story 2014-09-29 오전 2:27:42 게시되었습니다.


Piranha: Optimizing Short Jobs in Hadoop

범주                    Hadoop Information
 

Actually, while taking a note about a paper, I usually write it in English for a reminder. In fact, after doing it, my memory is getting worse than before to remember what I have understand the paper. It seems odd that I forgot what I learnt in detail after taking a note. That’s shit.

Korean Version about Piranha VLDB 2013 from Khaled Elmelegy 

Summary

-          아는 얘기지만, 대다수의 회사들이 like Facebook, Yahoo, Google, and so on분산처리 환경을 위해 Large-scale Map-Reduce / Hadoop 환경을 많이 쓰고 있다. 그리고 예전과 다르게 Higher level languages like Pig, Hive and Jaql 발전되면서 쉽게 분산처리 환경을 Query 있도록 만들어졌다. 이런 점진적 발전을 통해 다양한 회사들은 이런 Cluster 환경들을 Data warehouse 또는 Central processing units 으로 많이 쓴다.

하지만 기본 System (Hadoop) large-scale jobs 집중적으로 접목되어 만들어졌지만, 그들의 Workload 대부분은 Small job으로 구성되어있다. 예를 들어서 Yahoo production cluster이다.  80% 으로 구성되어 있는 Small Job들은 대부분 Online 또는 Ad Hoc query 이루어져있다. 요인은 Higher level language 발전에 있다. 워낙 Aggregated 사이즈가 크므로, Single DBMS에서는 수행하기에는 무리가 있다.

 

기존 Hadoop 환경의 문제점 for Small Jobs

Ø  Scalability with the respect to the job

Ø  Reliability so that jobs are fault tolerance

Ø  Throughput

 

 Piranha’s Environment Characterization

Ø  Small jobs unlikely to suffer from failures and avoids check-pointing intermediate results to disk

Ø  Piranha’s simple fault-tolerance mechanism extending Hadoop to support short jobs with a directed acyclic data flow graph

Ø  Tasks are self-coordinating instead of relying on Master for coordination.

 

 

Comment

-           논문을 그렇게 어렵지 않은 논문이었다. 내가 분야에 대해서 많이 접해서 그런지 몰라도 다른 논문들과 다르게 그렇게 Technical Method들도 나오지 않았고, 저자가 글을 써서 그런 것인지 말을 쉽게 해서 그런 것인지 이해도 6시간 읽은 치고 85% 이상은 같다.  왠지 쪽으로 논문을 주제로 잡고 싶다는 생각이 들었다. Distributed System 전체적인 성향을 파악해서 Optimal 알고리즘이나 메커니즘을 만들어 내는 것보다 오히려 어떤 특정한 상황을 발견해서, 물론 Yahoo처럼 분산처리 환경이 자유롭게 쓰이면서 파악할 있지는 않지만, 상황에 맞춰서 기존에 있던 Hadoop 환경을 이러한 방향으로 개선해서 쓰는 것이다.

일단 중요한 것은 ‘Majority’ 같다.  대다수라는 단어를 이용해서 어느 환경에서 어떤 것들이 주어져서 상황에 맞게 영리하게 단점을 찾기보다는 환경을 장점을 만드는 방법도 생각해야 하는 같다.

 

2014년 9월 27일 토요일

20140927 Learning how to mount from my co-worker in deatil

20140927 Learning how to mount from my co-worker in detail. Tips) How to install zshell !

in Sudo authority, check the disk in in your Linux
>> fdisk -l
>> df -h

If not, you should partition the disk first
>> fdisk /dev/sdb 
>> m for help
>> n for adding a new partition
>> p for primary
>> w

>> fdisk -l : checking it is right


Format file_system of the new disk
>> mkfs.ext4 /dev/sdb1

So far, completing the formatting of the new disk
then we should mount the new disk from the root to the local disk.

How to do it?
- mkdir some folder in your local disk
- sudo mount /dev/sdb1 the folder name/ -t ext4

Then Change the folder's authority to yours
>>sudo chown jjoon ./the folder name.


Tips) How to install zshell !
http://gumdaeng.com/2014/05/23/zsh-install/

Pictures)


    


http://byjoo.tistory.com/entry/etcfstab linux의 기본

  Thanks        gumdaengei

 Taking a note about xPad
Python lecture
2 Papers
Datamining
Textbook 4,5 Chapters

Degree (Graph theory)
- The degree of a vertex of a graph is the number of edges incident to the vertext with loops counted twice.

Hadoop) In order to execute some program, its extension should be .jar file, not .java file.

jar -cvf 생성할파일.jar 해당폴더명


hadoop fs -copyFromLocal abc.txt /input
hadoop fs -copyToLocal /output .

hadoop fs -rmr
hadoop jar WordCount.jar /input/abc.txt /output

1) javac WordCount.java
2) jar cmf main.txt WordCount.jar WordCount*.class
To set up which is Main class
3) in main.txt Main-Class: WfordCount

./oh_my-zsh
./tools/theme_chooser.sh
vim ~/.zshrc // being able to choose theme

In vim
command vs, sp

Ctrl + z / fg
python multiply.py matrix.json | more
>>>help()
tmux new -s tmux_window
tmux attach - t tmux_window
Ctrl + B / D, : 

2014년 9월 26일 금요일

20140927 Feeling stupid

I believe that most students doing Master's Degree or Ph.D think they are sometime really stupid when they face a troublesome problem.

These days, I really feel like that.
Even I think my programming skills are getting worse than when I was in the third year of undergraduate. 

Today's expression ========================================================

- I do not know what it is like to be a star.
- I sympathize with his thought and am disappointed at his way of working.
- Conceivably, a creative work is the thing for me.
- It is really interesting but at times rather time-consuming.
- I think it is high time that she goes went on diet.
- As is true of any housework, gardening is tedious.

- Of the top 10 universities Harvard University is counted as the best. counts as the best
- The number of Christians in Korea is estimated at about one thousand two millions.
- It is viewed as a useful alternative in improving  efficiency. 능률을 향상 시키는 것
- I do not know what it is like to act in cyberspace.
- People are not aware of the terrible effects human cloning has / could have
- Conceivably, a teaching job is the thing for me.
- It is really stereotyped but at times rather rewarding.
- I think It is high time that we introduced the new method into the firm industry.
- As is true of any jobs, fund managers face a radical challenge as well.

- The most enjoyable aspect of my work is the fact that I am a freelance.
- Nowadays, job mobility is the rule (하나의 대세) rather than the exception.
- Some people have a great sense of obligation, while others do not care about it at all.
- What is interesting to me may not be interesting to someone else.
- It seems like we got married yesterday. / It seems like only yesterday that we got married.

2014년 9월 23일 화요일

20140924

Today's expression ==========================================================

- Of major world religions Buddhism adopts a vegetarian diet.
- The number of vegetarians in Europe is estimated at several millions.
- There is certainly a tendency for vegetarians to tend to take low calorie than people on mixed diet. / to have lower calorie intakes
- Receiving a present is the finest that can happen to children.
- That school educational system is generally recognized as (being) the most successful of its kind.
- It is viewed as a useful alternative in dealing with the troublesome problems of school.
- It is quite easy to differentiate between middle-class children and working-class children.
- This indicates the degree to which people are influenced by what he learnt as a child.
- People are not simply aware of the terrible effects that nuclear bombs could have
- I do not know what it is like to be a start / what it + be verb + like + to

2014년 9월 22일 월요일

20140923 It is what it is, right?

Today's expression ==========================================================

- Special care should be taken in interpreting economic statistics.
- It is only a matter of time before we can find any evidence.
- I might go and change the product, rather than going on feeling bad about it.
- If you keep telling something talking about something for long enough, eventually, people will pay attention to you.
- Many people still consider it odd that woman can make a political career.
- Sometimes, being married has more pitfalls than people realize.
- It will make driving more comfortable as well as safe.
- A solution to the problem / an answer to the question / a key to the problem / a clue to the accident
- A solution to the problem may lie in taking alternative means of transportation, bicycles. using an alternative means of transport - the bicycle.
- A problem has recently arisen with British Royal Family.

- It tells well how the Korean economy has recovered.
- It is common knowledge that school education is being has been collapsed.
- It bases the Korean educational system on the Japanese style.
- It is largely due to the fact that the Korean educational system is basically Japanese style.
- Special care should be taken in dealing with the problem of sex education.
- It is only a matter of time before we reach a consensus.
- If we keep making an effort in this way, eventually we will achieve our goal.
- Many people still consider it odd that a few of teenagers dye their hair yellow.
- Working freelance has more pitfalls than people realize.
- Music makes our environment romantic as well as comfortable.
- Inflation and the foreign exchange rate has arisen.  A problem has arisen with inflation and foreign exchange rate.


2014년 9월 21일 일요일

2014 0920 when I was heading on Seoul in the bus to meet my friend

Writing a diary when I was on the but yesterday======================================

Now, I feel like I am traveling the other city unlike that I am in Korea. As I remember, this feeling is a sort of when I went on trip in Toronto from Montreal two years ago because I am listening with classical music without thinking of anything stressful, which was same as the time in Montreal.

Actually, it tells me how awesome my foreign experience was with many people cheering me out. When I think of Canada, especially Montreal, I tend to not consider what surrounds me now. This is largely because I could do whatever I wanted do when I was there. For example, generally speaking, drinking a lot is always harmful factor to human beings. It can make people sick and lost a part of memory called a black out. However, I drank a lot with my Montrealer friends, It felt like my condition was getting better than when I was in Korea before going there. I might be guessing the reason. It is probably, compared to my past life in Korea, I can tolerate anything burdensome and pressure in Montreal due to the fact that all I did is what I wanted to as I pointed out.

For some reason it is not beneficial to miss the past without recognize the present.
According to my relative experience, I am sort of the person who miss or reminisce precious memory when I am emotional. Some peoples close to me told me it is really good personality or character to feel emotionally something, whereas others  it can bother what you have to do and determine something important.

Who knows the truth? There is a chance that Montreal life led me to achieve many things precious to me after I was back to Korea. This is true for me to thrive in much intolerable thing when I dream of the Old Port of Montreal where I used to go two years ago in order to forget something tough.

2014년 9월 19일 금요일

20140920 Today's expression

Todays's expression==========================================================

- Teenagers are more influenced by the media than might be expected.
- National character is associated with the weather, either directly or indirectly. with climate
- Damage varies according to location or region.
- People studying in English are of any age & walk of life.
- The more longer we defer a decision, the more serious the problem is. will be
- When we think of the Internet, we tend to think of a computer.
- For whatever reason it is unacceptable to kill people.
- First of all, I would touch on basic structure of Korean educational system.
- This will help you make a decision of which is the most desirable.
- There is a sharp difference between ways of dressing according to personality. 

- I believe it important that such claims (should) spread out or be widely known.
- It tells well how the procedure were ignored.
- It is common knowledge that US president Cliton is a womanizer.
- My resentment to Karen is largely due to the fact that she solely occupies a room.
- Television is vital factor in holding a family together.
- It is open to question that Central Bank increased or raised interest rate last month.

Actually, I thought a homework of date mining assigned two days ago would take long time because we should set up a lot of experimental environment like Hadoop and GNU plot. In fact, It just took three hours. Before doing the homework last night, I watched Korean National Ballet performance with my friends, Eunbi. Well it was a great experience to release my stressful life.


Pagerank

- 검색엔진에 검색어를 통해 그것이 포함된 페이지를 찾는 기능도 있지만, 가장 인기 있는 페이지를 찾는 것도 중요한 기능이다.  (Google Page Rank Algorithm) 내 페이지에서 다른 페이지를 링크시키는 것을  Outlink, forward link라고 하고 그 반대도 존재한다.

단지 Inlink가 많다고 해서 인기 있는 페이지일까? 아니다. Source web site의 인지도에 따라서 권위도 다른다. 즉 권위있는 페이지로부터 받은  inlink는 같은 1개의 inlink라고 해도 상대적으로 높은 가중치를 부여해야한다.

그 가중치는  inlink를 보내는 페이지의 전체 outlink수로 나눠서 적용한다.

Shortest Path, as well known as minium cost to evaluate network performance
- 다익스타를 통해 (경로의 길이를 감안해서 간선을 연결하는 것이다. 하나의 시작점과 나머지 정점 경로를 계산한다)