Teapot Verses Life: 2011

Saturday, 29 October 2011

Coursework Part 1

Understanding Web 1.0 in the context of Public Libraries:

Introduction:

While the internet has had a disruptive effect on commercial areas such as publishing and the music industry, it has helped pull public libraries out of the 90’s lull, returning them as centres of education and providing users with access to electronic services such as the World Wide Web. In this essay I will be exploring the aspects of Web 1.0 which have affected library services by following the process of information, from the webpage itself to the end result and assessing its relevance to the user.

A person attempting to find a book in a public library would approach the OPAC (Online Public Access Catalogue) terminal and use methods of Information Retrieval to retrieve the information from the bibliographic Database, of which is usually hosted online and can be accessed through the Internet and the World Wide Web.

Internet and the World Wide Web:

Firstly, how is a functioning webpage created in order for it to be accessed by someone using an internet browser?

To convert a page of text into a webpage, it needs to be written in a mark-up language, one example of this is HTML (another is XML) which is an SGML based language. The simplest webpage that could be made with HTML would look like:
Title of

Once designed as above, this can be saved as an .html or .htm file and hosted on a server and can be accessed by identifying its URL. For example, the URL for the Westminster Libraries E-Catalogue is:

The http stands for HyperText Transfer Protocol, the most recognisable protocol (alongside FTP and POP3). The ‘elibrary.westminster.gov.uk’ section is the DNS or Domain Naming System. The rest is the local path to the server folder. In this case, the internet browser acts as a Client and sends a request to the computer at the address in the URL. The Server constantly runs an http ‘daemon’ which listens out for Client requests and once it has detected a message the Server sends a response, in this case the information for the webpage.

Looking at the page source for the above website, it is mostly HTML with some scripts written in Javascript and most of the design is part of an external cascading style sheet (CSS), which is linked to the .html file, but under its own .css file extension.

Databases:

To manage the entries within the library catalogue, they would be compiled within a database, which stores all entries centrally so there are no inconsistencies. Unlike spreadsheets, a database can contain millions of separate entries and the OPAC database above is no exception. However, a user of the library is more likely to use natural language to navigate the OPAC (Van Riel, 2008) and it would usually be the task of the information professional to query the database.

A database a compilation of data tables, which are two dimensional tables of data formed into columns and rows. A column is the field, for example Author, whereas a row makes up a complete record spanning across the fields (for example: Fields, Factories_and_workshops, Kropotkin, Peter_Alexseivich, 1912 – each would fall under a column but are across a single row).

Where a spreadsheet is used for compiling the data in one place for calculations, graphs etc, a database is used to answers specific queries. SQL (Structured Query Language) is a common language for communicating with database management software. SQL can be used for building databases and data tables but it is most commonly used for querying the database with commands such as SELECT, FROM and WHERE.

If the data you have stored is not homogenous or is unstructured, then it may be more appropriate to access the data via other means, such as information retrieval.

Information Retrieval:

Information retrieval is the field exploring information seeking behaviour and its relevance to the user (whether it satisfies their information needs). Quite often in a public library, a user does not know precisely what they are looking for (Van Riel, 2008), therefore the information needs to be indexed before being entered into the database, along with the metadata and keywords which the user may be searching for.

There are times when a keyword being searched for is particular to the result, but may be confused under natural language searches. Take the example of roman numerals, in this case Star Wars: Episode I, it is possible to search for this and find it through Best Match, but alternatively the query could be modified with +I to make sure the numeral
is included (Clegg, 2006), or the whole query could be entered as a phrase using quotation marks: “Star Wars: Episode I”.

It is quite likely that someone accustomed to using internet search engines would intrinsically use a Best Match technique, using natural language queries and then modifying the search terms if the results aren’t of relevance. However, there are other search modes which could be used. For example Boolean keyword searching, which notably removes stop words (i.e. all, the) and uses logical operators and positional operators.

Logical operators are commands which refine the results you would get; for example, Anarchists AND Communists would return results which including both terms together, whereas Anarchists OR Communists would return results with either, and Anarchists NOT Communists would return results about Anarchists alone. Positional operators allow you to retrieve results which have keywords in relation to each other; for example, ADJ would return results with the keywords side by side whereas SAME would return results in the same bibliographic record.

Conclusion:

It always comes down to precision. An incorrect set of parameters in an SQL query would result in an error message, a badly worded search engine query would bring you irrelevant results and the smallest slip in the coding of HTML would result in a broken section of the webpage.

In 2008, Google announced that they would be updating their programming in order capture database content (Devine and Egger-Sider, 2009), which would blur the lines between internet search engines and data retrieval from databases. If successful, it would become easier for users to procure relevant and precise information which satisfies their query. But of course, no matter how simple a task becomes, human error will always be present and it will still be up for the information professionals to return precise queries.

Bibliography and References:

http://philsci-archive.pitt.edu/2536/1/iimd.pdf (accessed 27/10/11)
Andy MacFarlane, Richard Butterworth, and Jason Dykes Lecture (2011) Lecture 02: The Internet and the World Wide Web London: City University.
http://elibrary.westminster.gov.uk/uhtbin/webcat (accessed 25/10/11)
http://www.w3.org/People/Raggett/book4/ch02.html (accessed 26/10/11)
http://www.w3.org/TR/html4/intro/sgmltut.html (accessed 26/10/11)
http://www.isgmlug.org/sgmlhelp/g-sg.htm (accessed 26/10/11)
Andy MacFarlane, Richard Butterworth, and Anton Krause (2011) Lecture 03: Structuring and querying information stored in databases
http://sqlzoo.net/w.htm (accessed 26/10/11)
Andrew MacFarlane (2011) Lecture 04: Information Retrieval
Van Riel, R, Fowler, O and Downes, A (2008) The Reader-friendly Library Service. Newcastle upon Tyne: The Society of Chief Librarians.
http://library.indstate.edu/about/units/instruction/key.pdf (accessed 26/10/11)
Clegg, B (2006) Studying Using The Web. New York: Routledge.
Devine, J and Egger-Sider, F (2009) Going Beyond Google. London: Facet Publishing.

Blog URL: http://teapotverseslife.blogspot.com/

All images created by Shaun Condon for the purpose of this blog.

Monday, 10 October 2011

DITA - Week 3

When I attempted to query the database with the examples shown in the lecture, I quickly found that I was using the wrong words and descriptions for the entities and the data tables. For example, I am asked in Task 1.1 to use the field 'Company Name' which I tried as 'company name', 'CompanyName' and other variations before having to ask for help.

I used the command "show tables;" to find the first set of data tables, then specified that I wanted publishers in order to find the exact term for 'Company Name' using the command "desc publishers;".

+--------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+-------+
| pubid | int(10) unsigned | NO | PRI | 0 | |
| name | varchar(100) | YES | | NULL | |
| company_name | varchar(100) | YES | | NULL | |
| address | varchar(100) | YES | | NULL | |
| city | varchar(100) | YES | | NULL | |
| state | varchar(100) | YES | | NULL | |
| zip | varchar(30) | YES | | NULL | |
| telephone | varchar(30) | YES | | NULL | |
| fax | varchar(30) | YES | | NULL | |
| comments | text | YES | | NULL | |

Which gave me the first answer, Company Name is written like company_name. Now on with the tasks.

Task:

Develop SQL queries to return following information :

A list of the PubID, Name, Company Name and City for all publishers based in the city of New York
A list of all fields for publishers named Prentice Hall.
A list of the Title, Year and ISBN for all titles published in 1994.
A list of the Title, Year, ISBN and PubID for all titles published since 1980 in year order
A list of all fields in the Titles table for books whose title begins with the word 'database' (regardless of upper/lower case letters)
A list of all fields in the Titles table for books whose title with the word 'database' anywhere in the title (regardless of upper/lower case letters)
A list of the title, Year Published and ISBN for all books with 'SQL' in the title written since 1990 in date order
A list of the Company Names of all publishers who have published books on programming since 1990
The name of the publisher who published a book with ISBN 0-0280074-8-4
The name of the author who wrote "A Beginner's Guide to Basic" listing also, the ISBN and name of this book.

Answers:

1) select PubID, Name, Company_Name, City
from publishers
where City = "New York";

2) select * from publishers where name = "Prentice Hall";

3) select title, year_published, isbn
from titles
where year_published = 1994;

(For 1.3, I managed to get the result, but the entries where too large to fit on the screen. Also, the MySQL shell couldn't fit all the characters from the result on screen, I tried to rearrange them with the added \G or \g on the end of the command, but the result was still off screen.)

4) I originally tried "select title, year_published, isbn, pubid from titles where year_published >1979;" but found that the date was out of order (again the result was too large to fit on the screen). Instead I tried

select title, year_published, isbn, pubid
from titles
where year_published >1979 order by year_published asc;

... and found my result.
5) select *
from titles
where title like "database%";

6) This is similar, only to get an 'any' result I add another wildcard.

select *
from titles
where title like "%database%";

7) "select title, year_published, isbn from titles where title like "%SQL%", year_published >= 1990 order by year_published asc;" was my original guess, however it was not correct. Whereas the entities such as title and year_published can be separated by a comma, the last bit of "%SQL%" and "year_published" had to be written with an and.

select title, year_published, isbn
from titles
where title like "%SQL%" and year_published >= 1990 order by year_published asc;

8) select distinct company_name
from publishers, titles
where title like "%programming%" and publishers.pubid = titles.pubid and year_published >= 1990;

Had real trouble with this one, I knew I had to knit two tables together but struggled to remember learned lessons such as the SELECT bit can have entities seperated by a comma, but from then on I must seperate with AND. Also, finding out the right combination of primary key and foreign key was a step for the imagination.

9) select company_name
from publishers, titles
where publishers.pubid = titles.pubid and isbn like "0-0280074-8-4";

Similar to the previous one, I tied the two parts together with primary and foreign keys, the rest was rather simple.

10) I needed help with this one. With the previous question, I was only attempting to query the company name, whereas here I have to find the author's name as well as the isbn and title of the book. However, I did get a satisfactory answer.

select author, titles.isbn, title
from authors, title_author, titles
where authors.au_id = title_author.au_id and title_author.isbn = titles.isbn
and title =”A Beginner’s Guide to Basic”;

The part I still am trying to comprehend is the where section, with knitting together two parts of the table.

Sunday, 9 October 2011

Adding CSS

I stumbled over this for a while, in my last post I mentioned how I was worried by CSS, well I had a reason.

However, I believe I have it.

There are two ways of adding CSS, External and Internal Style Sheets. If I were to add it internally, I would in the HTML for every page - seeing as I have only 3 pages and am struggling, I have opted for External.

An example taken from W3 Schools for an Internal Style Sheet is this:

(head)
(style type="text/css")
hr {color:sienna;}
p {margin-left:20px;}
body {background-image:url("images/back40.gif");}
(/style)
(/head)

So the 'style type' bit identifies it as being CSS rather than something like JSS Javascript Style Sheets which Netscape 4 uses but no one else. Then the rest of the text defines the page, so colour, margin, background image etc.

For External Style Sheets I had to create a .css file. I opened notepad and copied the Simple Style Sheet offered in the exercises for week 2 and saved it as stsh1.css (style sheet one), saved it to a folder called CSS. Then I had to link my pages to it.

(Actually - at this point - it automatically 'worked' when I opened up the page to test it, turns out as I'd been playing around with the CSS I had enough on there to make it work locally. Only when I tried to upload it to the University computers that it crashed and died again.)

In the HEAD section of every page I created, I added this:

(LINK REL="STYLESHEET" TYPE="text/css" HREF="css/stsh1.css")

This is the link to the CSS file I have saved and this simple line is what turned my page from white space to something slightly more palatable.

Now I am afraid to play around with it for f34r of breaking it.

Monday, 3 October 2011

DITA - Week 2

The Internet and the Web:

Task One: "Find out about three of the following tags or elements and attributes and parameters that can be used within them."

Paragraphs and Line Breaks:

I feel comfortable with the tags for paragraph (p) and linebreaks (br) which are used to group text on the page, a new line in notepad doesn't translate to a new line in a webpage unless a linebreak or paragraph tag is added. A (hr) tag indicates that a line will be put across the webpage.

Ones I am not so familiar with are Meta Information, Tables, Unordered Lists and Ordered Lists, while I can assume what they do, I have never used them.

Meta Information:

From what I can tell, Meta Tags will not show up in the main body of a webpage, it is also advised to put them into the HEAD section of the HTML. Their function is more behind the scenes than anything else, with smart keywords allowing for search engine optimisation to be more efficient.

Here are a few examples that I have found:

(meta name="description" content="Free Web tutorials" /)
(meta name="keywords" content="HTML,CSS,XML,JavaScript" /)
(meta name="author" content="Hege Refsnes" /)

Pretty self explanatory, keywords for search engine optimisation and so the browser can find the page through a keyword search, I would place myself as the author of the webpage and the description would be a quick few words that describes my new webpage.

Unordered / Ordered List:

For the task ahead of me I have written a brief example of an unordered and an ordered list. A brief unordered list of demands, and an ordered list of the faux rebellion's demands.

(ul)
(li)More cheese on top of pasta bakes(/li)
(li)Less tax on space station duties(/li)
(li)Total acceptance of the new world order(/li)
(/ul)

(ol)
(li)Rebellion(/li)
(li)?????(/li)
(li)Profit!(/li)

One limitation of ordered and unordered lists that I have noticed is it's limited to just bulletpoints and numbers, I wonder how these could be customised to include Roman numerals, for example, which may lay in another computer code.

Task Two:
Producing Some HTML:

As with last week, I have decided to do something a little different from the set task, rather than create a webpage for myself about myself, I have created a fake rebellion with some silly aims.

Firstly, I created a .first page which will be the starting point to my webpage, I did this by using the original HEADTITLEBODY template provided. With some reasonable looking HTML code I imported that into EditPlus2, which highlighted the tags and made things a lot easier to do.

Unfortunately, I did not heed the advice of the tutors and mistakingly left out the closing / for one of my tags, oh the embarassment.

Once I had my first page, I created an index page and linked the first page to it. I created the lists as shown above and played around with some of the meta tags.

(META NAME="Author" CONTENT="MrTeapot")
(META NAME="Keywords" CONTENT="New World Order, Rebellion, Take Over")
(META NAME="Description" CONTENT="A massive yarn on you all.")

I was rather impressed as I had only just discovered Meta Tags hidden in the page source and had no clue how to use them, but was soon adding tags to my pages.

CSS

I was worried by the introduction of CSS, and as I had no time to tackle it in lesson, I skipped that part. My webpage will be designed in my own time, I wished to publish my webpage for all to enjoy and skipped to the next task *slap on the wrist for me*.

Publishing

This was the most complicated part for me, understanding how webpages online are managed.

Firstly, I Mapped my W: Drive to allow me to just drag and drop any files I want into my public folder, which can be accessed online.

To actually publish, I had to open a program called Telnet in order to do this. Once I had connected to unix.city.ac.uk and logged in, I could use a single keystroke to publish my html documents and access them again with the URL that leads to my public folder.

Phew. Time to go back and do some CSS stuff now.

Monday, 26 September 2011

First Lab

Uni Day 1:

I have gone through the first exercise in Digital Information Technologies and Architecture and am going to write up my thoughts and process.

It seems I am familiar enough with saving into various formats, but I am still discovering new information. ASCII was an area I had little understanding of, seems that is still the case but now I can say it is 'gobbledegook' and everyone else is using the same terminology.

So this is how I went through the exercise:

First of all, I created a Wordpad file with a bit of text - "Currently there are 21 people in the room" - which at the time was correct. Once closed in Wordpad, I opened the .txt file in Explorer and had a look:

{\rtf1\ansi\ansicpg1252\deff0\deflang2057{\fonttbl{\f0\fswiss\fcharset0 Arial;}}
{\*\generator Msftedit 5.41.15.1515;}\viewkind4\uc1\pard\f0\fs20 Currently there are 21 people in the room.\par
}

So far I recognise the text, the font style and everything else is 'gobbledegook'.

Next task was to open in Word and play around, I decided to embolden the text, change the type to Calibri (my preferred style) and made it a little larger.

Now I have changed a few variables, I open it into notepad and see what I was expecting...gobbledegook.

Íoo6¯h4ÇCÔvžDL±¤ eök)©l±×”86îÔ.ôš£ ôºüÐ
ÊUš>É0Í€ü"Sì+ a_Ýƒ( ›ÿÏvuÝ•øâÊc–¯TÈ/<¼!s¼ ÅX d 3‰´ ¯ƒ¬– ¡? 'g ![ ?óü 4ê¹úÇ%ë9Ž þ¶RŽk6Çð°$Cí, ú`& gë !/ =ÿ ÿÿ PK ! Ÿ :Ok word/document.xmlÌTÛŽÓ0 }Gâ ¢¼·IË %ÚvUµ»‚ PÕ]>ÀMœÆ‹í±l§¡ûõÌ8—í XU< )r<ÇsæÌ%¾¾ù©dtàÖ Ðóx2Nãˆë
¡÷óøûÃÝh GÎ3]0 šÏã#wñÍâí›ë&+ ¯ ×>B
í² ¢•÷&K —W\17 Ã5‚%XÅQÌþ¨Í( e˜ ;!…?&Ó4ý w40k«³Žb¤DnÁAéÉ%ƒ² 9ï–ÞÃ^ ·õ\w’CÄÄr‰ @»J ×³©eÃ «žäðZ %ûs¹$ZaYƒýP²•Ý€-Œ…œ;‡Öu Œ“ôµØ] ‰bð¸DÂË˜½ Å„ hh:Îú?4oŒÍKÚØ Q='‚µXà,í 8Òj¢&ÃY,¶ó8íž¸7m°Ñi:Y¿»]

This is only a small example. For an experiment I copy/pasted the characters from the notepad doc to a fresh Word doc. It came to 577 words stretched across 56 pages. All for the original 8 words "Currently there are 21 people in the room"

Here I just want to directly quote the text supplied after this task in the exercise, a little information about ASCII that I found useful.

"The binary information has not been translated into ASCII that we can easily interpret. Whilst some of the 7-bit strings are interpreted as orders of characters that make sense to us, many of them do not relate to ASCII codes that make any sense in combination and so we see all sorts of data that cannot be interpreted meaningfully as text. The document that we have created is a binary file that is interpreted by one particular word processor (MS Word), but does not apparently rely solely upon ASCII encoding. So we cannot read the document without 'Word' and other word processors may not understand the data format - the agreed way in which sequences of bits and bytes are translated into information that is understood by computer programs.

We have seen that there are ways of marking documents up in ASCII to add formatting. Here we will specifically save our document using one such format. HTML is a language that marks text up for presentation using agreed tags. It uses alphanumeric characters and so relies upon a text format like ASCII. We will learn more about HTML as the module progresses. It supports a document-centred view of information and is designed for creating documents that share files and refer to one another across the Internet."

So rather than opening a Word document in notepad, the next stage is to identify some of the HTML. So I have opened the Word file and saved it as a Webpage, which can now be opened with notepad to reveal the HTML.

It looks more familiar than the above ASCII, I can see HEAD and BODY and a couple of /DIV thrown in. But there is a lot of text I am not familiar with (such as meta tags, w:, o:) and I get lost trying to find my original sentence. I found the words at the bottom of the page hidden in this paragraph.

[p class=MsoNormal style='margin-bottom:0cm;margin-bottom:.0001pt;line-height:
normal;mso-pagination:none;mso-layout-grid-align:none;text-autospace:none'][b][span
style='font-size:14.0pt;font-family:"Cambria","serif";mso-ascii-theme-font:
major-latin;mso-hansi-theme-font:major-latin;mso-bidi-font-family:Arial']Currently
there are 21 people in the room.[o:p][/o:p][/span][/b][/p]

Again, this is only for 8 words, but again I copied this into Word to check it out and found the total text in the notepad document comes to 1,528 words stretched across 18 pages.

Now we are going to play with images.

In my task, I went away from Weather text and kept with People In The Room, I found a nice picture of people in a room. Here it is, for any interested. Don't they look happy to be in a room?

One I had got my image and saved it to the same folder I then inserted it to the Word document with 'Link to File' - a shortcut that I have never used before, but keeps the two separate. If I change the image in the directory, it will change in the Word document, good tip. In fact, I think I'll quote the related text.

The 'Link to file' option means that the document does not physically contain or save the imagery independently of the original image file. Rather it saves a link to the location of the file on the computer and knows to incorporate the information from the file in which the image is stored when the document is displayed. This means that if the image file is changed, the image in the document will be updated. View the ASCII code generated by Word when it saved your document in Notepad.

Now I have linked the image into the Word document, I want to analyse the notepad version and look at the HTML.

My experience with HTML has all been online, I am used to searching for images in the HTML with tags resembling these:

{a href="http://www.flickr.com/photos/example/" title="IMAG1242 by The Username I Have, on Flickr"}{img src="http://farm7.static.flickr.com/numbers/morenumbers_c8bc96c91c.jpg" width="500" height="376" alt="IMAG1242"}{/a}

However, I am looking through notepad and can't see my image anywhere. I copy the notepad text and copy it into Word to check the word length and pages on the assumption that making a piece of text bold turns it into thousands of words, adding an image (or rather linking an image) will make it tens of thousands. In actual fact it adds 2 pages to the last version and only another 98 words. Very odd.

And that is the task. I have created a few files of varying formats, .txt, .docx, .jpg, and a html file. I have learnt about linking images to documents. Also, I am beginning to grasp ACSII, which up till now was mostly used by myself to draw pictures in notepad.

Introduction Post

Hello!

My name is Shaun. I have an online nickname of Teapot (or MrTeapot if you were being formal) so that is why I have chosen to call my blog 'Teapot Verses Life'. I hope that explains things for people if you thought that was a bit weird (not that being called Teapot is a normal thing to be).

So I'm doing Information Science as a postgrad course. I thought I was quite young at 24 to be doing a postgrad but it seems everyone else around me is in a pretty comfortable age range, some people younger, some people older, either way I'm not on a limb here.

I work in two Libraries for Westminster City Council, which I'm working full time, which must make my course part-time, so for now I'll only see you on Mondays. The libraries are in two very different areas which gives me lots of chances to meet, and cater for, contrasting demographics. Of course, as they are public libraries, they are all mental cases, but at least I get to see two different types of mentals.

Anyway, I'm sure you'll find out more about me in person, this is getting a bit weird now.